Re: [RFC PATCH v3 4/6] media: tegra: Add Tegra210 Video input driver

From: Hans Verkuil
Date: Mon Feb 17 2020 - 03:04:28 EST


On 2/16/20 9:22 PM, Sowjanya Komatineni wrote:
>
> On 2/16/20 12:11 PM, Sowjanya Komatineni wrote:
>>
>> On 2/16/20 11:54 AM, Sowjanya Komatineni wrote:
>>>
>>> On 2/16/20 3:03 AM, Hans Verkuil wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 2/14/20 7:23 PM, Sowjanya Komatineni wrote:
>>>>> Tegra210 contains a powerful Video Input (VI) hardware controller
>>>>> which can support up to 6 MIPI CSI camera sensors.
>>>>>
>>>>> Each Tegra CSI port can be one-to-one mapped to VI channel and can
>>>>> capture from an external camera sensor connected to CSI or from
>>>>> built-in test pattern generator.
>>>>>
>>>>> Tegra210 supports built-in test pattern generator from CSI to VI.
>>>>>
>>>>> This patch adds a V4L2 media controller and capture driver support
>>>>> for Tegra210 built-in CSI to VI test pattern generator.
>>>>>
>>>>> Signed-off-by: Sowjanya Komatineni <skomatineni@xxxxxxxxxx>
>>>>> ---
>>>>> Â drivers/staging/media/KconfigÂÂÂÂÂÂÂÂÂÂÂÂÂ |ÂÂÂ 2 +
>>>>> Â drivers/staging/media/MakefileÂÂÂÂÂÂÂÂÂÂÂÂ |ÂÂÂ 1 +
>>>>> Â drivers/staging/media/tegra/KconfigÂÂÂÂÂÂÂ |ÂÂ 10 +
>>>>> Â drivers/staging/media/tegra/MakefileÂÂÂÂÂÂ |ÂÂÂ 8 +
>>>>> Â drivers/staging/media/tegra/TODOÂÂÂÂÂÂÂÂÂÂ |ÂÂ 10 +
>>>>> Â drivers/staging/media/tegra/tegra-common.h |Â 239 +++++++
>>>>> Â drivers/staging/media/tegra/tegra-csi.cÂÂÂ |Â 374 ++++++++++
>>>>> Â drivers/staging/media/tegra/tegra-csi.hÂÂÂ |Â 115 ++++
>>>>> Â drivers/staging/media/tegra/tegra-vi.cÂÂÂÂ | 1019
>>>>> ++++++++++++++++++++++++++++
>>>>> Â drivers/staging/media/tegra/tegra-vi.hÂÂÂÂ |ÂÂ 79 +++
>>>>>  drivers/staging/media/tegra/tegra-video.c | 118 ++++
>>>>>  drivers/staging/media/tegra/tegra-video.h | 32 +
>>>>> Â drivers/staging/media/tegra/tegra210.cÂÂÂÂ |Â 767
>>>>> +++++++++++++++++++++
>>>>> Â drivers/staging/media/tegra/tegra210.hÂÂÂÂ |Â 190 ++++++
>>>>> Â 14 files changed, 2964 insertions(+)
>>>>> Â create mode 100644 drivers/staging/media/tegra/Kconfig
>>>>> Â create mode 100644 drivers/staging/media/tegra/Makefile
>>>>> Â create mode 100644 drivers/staging/media/tegra/TODO
>>>>> Â create mode 100644 drivers/staging/media/tegra/tegra-common.h
>>>>> Â create mode 100644 drivers/staging/media/tegra/tegra-csi.c
>>>>> Â create mode 100644 drivers/staging/media/tegra/tegra-csi.h
>>>>> Â create mode 100644 drivers/staging/media/tegra/tegra-vi.c
>>>>> Â create mode 100644 drivers/staging/media/tegra/tegra-vi.h
>>>>> Â create mode 100644 drivers/staging/media/tegra/tegra-video.c
>>>>> Â create mode 100644 drivers/staging/media/tegra/tegra-video.h
>>>>> Â create mode 100644 drivers/staging/media/tegra/tegra210.c
>>>>> Â create mode 100644 drivers/staging/media/tegra/tegra210.h
>>>>>
>>>> <snip>
>>>>
>>>>> +/*
>>>>> + * videobuf2 queue operations
>>>>> + */
>>>>> +static int tegra_channel_queue_setup(struct vb2_queue *vq,
>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int *nbuffers,
>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int *nplanes,
>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int sizes[],
>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct device *alloc_devs[])
>>>>> +{
>>>>> +ÂÂÂÂ struct tegra_vi_channel *chan = vb2_get_drv_priv(vq);
>>>>> +
>>>>> +ÂÂÂÂ if (*nplanes)
>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂ return sizes[0] < chan->format.sizeimage ? -EINVAL : 0;
>>>>> +
>>>>> +ÂÂÂÂ *nplanes = 1;
>>>>> +ÂÂÂÂ sizes[0] = chan->format.sizeimage;
>>>>> +ÂÂÂÂ alloc_devs[0] = chan->vi->dev;
>>>>> +
>>>>> +ÂÂÂÂ /*
>>>>> +ÂÂÂÂÂ * allocate min 3 buffers in queue to avoid race between DMA
>>>>> +ÂÂÂÂÂ * writes and userspace reads.
>>>>> +ÂÂÂÂÂ */
>>>>> +ÂÂÂÂ if (*nbuffers < 3)
>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂ *nbuffers = 3;
>>>> First of all, don't check this here, instead set the struct
>>>> vb2_queue field
>>>> 'min_buffers_needed' to 3 instead.
>>>>
>>>> But the reason given for this check is peculiar: there should not be
>>>> any
>>>> race at all. Usually the reason for requiring a specific minimum
>>>> number of
>>>> buffers is that the DMA engine needs at least 2 buffers before it
>>>> can start
>>>> streaming: it can't give back a buffer to userspace (vb2_buffer_done())
>>>> unless there is a second buffer it can start to capture to next. So
>>>> for many
>>>> DMA implementations you need a minimum of 2 buffers: two buffers for
>>>> the
>>>> DMA engine, one buffer being processed by userspace.
>>>>
>>>> If the driver is starved of buffers it will typically keep capturing to
>>>> the last buffer until a new buffer is queued.
>>>>
>>>> In any case, once the driver releases a buffer via vb2_buffer_done()
>>>> the
>>>> buffer memory is no longer owned by the driver.
>>>>
>>>> To be precise, buffer ownership is as follows:
>>>>
>>>> userspace -> VIDIOC_QBUF -> vb2 -> buf_queue -> driver ->
>>>> vb2_buffer_done() -> vb2 -> VIDIOC_DQBUF -> userspace
>>>>
>>>> (vb2 == videobuf2 framework)
>>>>
>>>> Note that vb2 never touches the buffer memory.
>>>>
>>>> So if you get a race condition in this driver, then there is something
>>>> strange going on. It looks like vb2_buffer_done() is called while
>>>> DMA is
>>>> still ongoing, or because the driver really needs to keep one buffer
>>>> available at all times.
>>>>
>>>> Regards,
>>>>
>>>> ÂÂÂÂÂÂÂÂ Hans
>>>
>>> Thanks Hans.
>>>
>>> On running v4l2-compliance streaming tests for longer run, I noticed
>>> kernel reporting unable to write to read-only memory and with debugs
>>> I observed when this error was reported, I see 2 buffers queued and
>>> both using same address.
>>>
>>> for first buffer capture start thread initiates capture and wakes
>>> done thread to wait for memory write ack and once its done buffer is
>>> released to user space but I see upon buffer released to user space
>>> immediate next buffer capture single shot gets issued (as soon as
>>> single shot is issued frame capture data is written to memory by DMA)
>>> and I see this kernel error of unable to write to read-only memory.
>>>
>>> This error happens rare and happens on long run and all the times of
>>> repro's, I see when other thread releases buffer immediate I see
>>> single shot gets issued as 2 buffers are queued up at the same time
>>> with same DMA address.
>>>
>> Just to be clear, I meant all the times when kernel reports error
>> unable to write to read-only memory, I see 2 buffers gets queued and
>> as the capture start thread and done thread are parallel and when
>> capture thread wakes done thread on receiving FS event, done thread
>> for waiting for memory write happens parallel to next frame capture
>> and I see while vb2_buffer_done happens in done thread next frame
>> single shot has been issues by capture start thread in parallel when
>> it hits this error.
>
> For low latency, we use 2 threads one thread for capture and wait for FS
> and on receiving FS even wakes other done thread to wait for memory
> write to finish.
>
> While other done thread waits for memory write to finish, capture thread
> can start capture for next frame and as soon as single shot is issued
> capture frame is written to memory and as this thread runs in parallel
> to done thread
>
> there is a possibility vb2_buffer_done being called by
> kthread_capture_done while DMA is ongoing by kthread_capture_start and I
> observed same DMA address being used got both buffers that got queued at
> same time when it hits this error.

"buffers that got queued": you mean that tegra_channel_buffer_queue() is
called twice with different buffers (i.e. with different buffer index values)
but with the same DMA address?

That should not happen (unless the first buffer was returned with
vb2_buffer_done() before the second buffer was queued).

Can you provide more details? E.g. buffer index, memory model used when
streaming, total number of buffers allocated by REQBUFS.

I would like to fully understand this. Just increasing the minimum number
of buffers, while reasonable by itself, *does* feel like it is just
hiding the symptoms.

Regards,

Hans

>
>>> With using minimum 3 buffers, this issue doesnt happen at all from
>>> almost 72 hours of testing.
>>>
>>>
>>> Will try with setting vb2 queue field min_buffers_needed as 3 instead
>>> of adding check in queue setup.
>>>
>>
>>>
>>>>> +
>>>>> +ÂÂÂÂ return 0;
>>>>> +}