Re: [PATCH v3] usb: gadget: uvc: Fix crash when encoding data for usb request

From: Dan Vacura
Date: Fri Apr 29 2022 - 14:39:29 EST


Hi Laurent,

ccing Michael for comments about returning v4l2 bufs too early.

On Mon, Apr 25, 2022 at 01:58:52AM +0300, Laurent Pinchart wrote:
> Hi Dan,
>
> On Wed, Apr 20, 2022 at 04:27:27PM -0500, Dan Vacura wrote:
> > On Tue, Apr 19, 2022 at 11:46:37PM +0300, Laurent Pinchart wrote:
> > >
> > > This indeed fixes an issue, so I think we can merge the patch, but I
> > > also believe we need further improvements on top (of course if you would
> > > like to improve the implementation in a v4, I won't complain :-))
> >
> > It looks like Greg has already accepted the change and it's in
> > linux-next. We can discuss here how to better handle these -EXDEV errors
> > for future improvements, as it seems like it's been an issue in the past
> > as well:
> > https://www.mail-archive.com/linux-usb@xxxxxxxxxxxxxxx/msg105615.html
> >
> > > As replied in v2 (sorry for the late reply), it seems that this error
> > > can occur under normal conditions. This means we shouldn't cancel the
> > > queue, at least when the error is intermitent (if all URBs fail that's
> > > another story).
> >
> > My impression was that canceling the queue was still necessary as we may
> > be in progress for the current frame. Perhaps we don't need to flush all
> > the frames from the queue, but at a minimum we need to reset the
> > buf_used value.
>
> I think we have three classes of errors:
>
> - "Packet-level" errors, resulting in either data loss or erroneous data
> being transferred to the host for one (or more) packets in a frame.
> When such errors occur, we should probably notify the application (on
> the gadget side), but we can continue sending the rest of the frame.
>
> - "Frame-level" errors, resulting in errors in the rest of the frame.
> When such an error occurs, we should notify the application, and stop
> sending data for the current frame, moving to the next frame.
>
> - "Stream-level" errors, resulting in errors in all subsequent frames.
> When such an error occurs, we should notify the application and stop
> sending data until the application takes corrective measures.
>
> I'm not sure if packet-level errors make sense, if data is lost, maybe
> we would be better off just cancelling the current frame and moving to
> the next one.
>
> For both packet-level errors and frame-level errors, the buffer should
> be marked as erroneous to notify the application, but there should be no
> need to cancel the queue and drop all queued buffers. We can just move
> to the next buffer.
>
> For stream-level errors, I would cancel the queue, and additionally
> prevent new buffers from being queued until the application stops and
> restarts the stream.
>
> Finally, which class an error belongs to may not be an intrinsic
> property of the error itself, packet-level or frame-level errors that
> occur too often may be worth cancelling the queue (I'm not sure how to
> quantify "too often" though).
>
> Does this make sense ?

Yes, this makes sense, but I'm not sure if the USB controllers send back
that info and/or if it's all standardized. For example in the dwc3
controller I see the following status values returned: -EPIPE,
-ECONNRESET, -ESHUTDOWN, and -EXDEV; whereas in the musb controller
doesn't appear to return -EXDEV.

>
> > > We likely need to differentiate between -EXDEV and other errors in
> > > uvc_video_complete(), as I'd like to be conservative and cancel the
> > > queue for unknown errors. We also need to improve the queue cancellation
> > > implementation so that userspace gets an error when queuing further
> > > buffers.
> >
> > We already feedback to userspace the error, via the state of
> > vb2_buffer_done(). When userspace dequeues the buffer it can check if
> > v4l2_buffer.flags has V4L2_BUF_FLAG_ERROR to see if things failed, then
> > decide what to do like re-queue that frame. However, this appears to not
> > always occur since I believe the pump thread is independent of the
> > uvc_video_complete() callback. As a result, the complete callback of the
> > failed URB may be associated with a buffer that was already released
> > back to the userspace client.
>
> Good point. That would only be the case for errors in the last
> request(s) for a frame, right ?

>From logging it seems it can be up to the last few requests that come
back, where the queue is already empty. I guess this is timing dependent
on the hw, the system scheduling, and how deep the request queue is.

>
> > In this case, I don't know if there's
> > anything to be done, since a new buffer and subsequent URBs might
> > already be queued up. You suggested an error on a subsequent buffer
> > queue, but I don't know how helpful that'd be at this point, perhaps in
> > the scenario that all URBs are failing?
>
> Should we delay sending the buffer back to userspace until all the
> requests for the buffer have completed ?

Yes, I had that same thought later on. We'll either need a new
pending_complete_queue to move the buffer from the current queue, or
create logic to leverage the existing uvc_buffer_state flags; like
uvcg_queue_head() looks for the first buffer with UVC_BUF_STATE_QUEUED.
When the complete call comes in we'll have to identify if the request is
the last completed one for that buffer. It looks like the UVC_STREAM_EOF
flag will be present, hopefully that's sufficient to rely on. Also,
since this is a FIFO queue, we can assume that the first buffer with
UVC_BUF_STATE_DONE can be vb2_buffer_done()'d. If we implement this type
of logic then we can probably remove this change:
https://lore.kernel.org/r/20220402232744.3622565-3-m.grzeschik@xxxxxxxxxxxxxx
since its purpose is similar. How does that sound and do you have an
opinion on a new queue or creating logic around uvc_buffer_state flags?

>
> --
> Regards,
>
> Laurent Pinchart