Re: [PATCH 0/3] Collapse vimc into single monolithic driver

From: Helen Koike
Date: Mon Aug 12 2019 - 19:41:49 EST


Hi Shuah,

On 8/12/19 7:14 PM, Shuah Khan wrote:
> On 8/12/19 1:10 PM, Shuah Khan wrote:
>> On 8/12/19 12:52 PM, Andrà Almeida wrote:
>>> Hi Shuah,
>>>
>>> On 8/12/19 11:08 AM, Shuah Khan wrote:
>>>> On 8/9/19 9:51 PM, Helen Koike wrote:
>>>>> Hi Andre,
>>>>>
>>>>> Thanks for testing this.
>>>>>
>>>>> On 8/9/19 9:24 PM, Andrà Almeida wrote:
>>>>>> On 8/9/19 9:17 PM, Shuah Khan wrote:
>>>>>>> Hi Andre,
>>>>>>>
>>>>>>> On 8/9/19 5:52 PM, Andrà Almeida wrote:
>>>>>>>> Hello Shuah,
>>>>>>>>
>>>>>>>> Thanks for the patch, I did some comments below.
>>>>>>>>
>>>>>>>> On 8/9/19 6:45 PM, Shuah Khan wrote:
>>>>>>>>> vimc uses Component API to split the driver into functional
>>>>>>>>> components.
>>>>>>>>> The real hardware resembles a monolith structure than component and
>>>>>>>>> component structure added a level of complexity making it hard to
>>>>>>>>> maintain without adding any real benefit.
>>>>>>>>> ÂÂÂÂÂÂ The sensor is one vimc component that would makes sense to be a
>>>>>>>>> separate
>>>>>>>>> module to closely align with the real hardware. It would be easier to
>>>>>>>>> collapse vimc into single monolithic driver first and then split the
>>>>>>>>> sensor off as a separate module.
>>>>>>>>>
>>>>>>>>> This patch series emoves the component API and makes minimal
>>>>>>>>> changes to
>>>>>>>>> the code base preserving the functional division of the code
>>>>>>>>> structure.
>>>>>>>>> Preserving the functional structure allows us to split the sensor off
>>>>>>>>> as a separate module in the future.
>>>>>>>>>
>>>>>>>>> Major design elements in this change are:
>>>>>>>>> ÂÂÂÂÂÂ - Use existing struct vimc_ent_config and struct
>>>>>>>>> vimc_pipeline_config
>>>>>>>>> ÂÂÂÂÂÂÂÂ to drive the initialization of the functional components.
>>>>>>>>> ÂÂÂÂÂÂ - Make vimc_ent_config global by moving it to vimc.h
>>>>>>>>> ÂÂÂÂÂÂ - Add two new hooks add and rm to initialize and register,
>>>>>>>>> unregister
>>>>>>>>> ÂÂÂÂÂÂÂÂ and free subdevs.
>>>>>>>>> ÂÂÂÂÂÂ - All component API is now gone and bind and unbind hooks are
>>>>>>>>> modified
>>>>>>>>> ÂÂÂÂÂÂÂÂ to do "add" and "rm" with minimal changes to just add and rm
>>>>>>>>> subdevs.
>>>>>>>>> ÂÂÂÂÂÂ - vimc-core's bind and unbind are now register and unregister.
>>>>>>>>> ÂÂÂÂÂÂ - vimc-core invokes "add" hooks from its
>>>>>>>>> vimc_register_devices().
>>>>>>>>> ÂÂÂÂÂÂÂÂ The "add" hooks remain the same and register subdevs. They
>>>>>>>>> don't
>>>>>>>>> ÂÂÂÂÂÂÂÂ create platform devices of their own and use vimc's
>>>>>>>>> pdev.dev as
>>>>>>>>> ÂÂÂÂÂÂÂÂ their reference device. The "add" hooks save their
>>>>>>>>> vimc_ent_device(s)
>>>>>>>>> ÂÂÂÂÂÂÂÂ in the corresponding vimc_ent_config.
>>>>>>>>> ÂÂÂÂÂÂ - vimc-core invokes "rm" hooks from its unregister to
>>>>>>>>> unregister
>>>>>>>>> subdevs
>>>>>>>>> ÂÂÂÂÂÂÂÂ and cleanup.
>>>>>>>>> ÂÂÂÂÂÂ - vimc-core invokes "add" and "rm" hooks with pointer to struct
>>>>>>>>> vimc_device
>>>>>>>>> ÂÂÂÂÂÂÂÂ and the corresponding struct vimc_ent_config pointer.
>>>>>>>>> ÂÂÂÂÂÂ The following configure and stream test works on all devices.
>>>>>>>>> ÂÂÂÂÂÂ ÂÂÂÂ media-ctl -d platform:vimc -V '"Sensor
>>>>>>>>> A":0[fmt:SBGGR8_1X8/640x480]'
>>>>>>>>> ÂÂÂÂÂÂ media-ctl -d platform:vimc -V '"Debayer
>>>>>>>>> A":0[fmt:SBGGR8_1X8/640x480]'
>>>>>>>>> ÂÂÂÂÂÂ media-ctl -d platform:vimc -V '"Sensor
>>>>>>>>> B":0[fmt:SBGGR8_1X8/640x480]'
>>>>>>>>> ÂÂÂÂÂÂ media-ctl -d platform:vimc -V '"Debayer
>>>>>>>>> B":0[fmt:SBGGR8_1X8/640x480]'
>>>>>>>>> ÂÂÂÂÂÂ ÂÂÂÂ v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v
>>>>>>>>> width=1920,height=1440
>>>>>>>>> ÂÂÂÂÂÂ v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v
>>>>>>>>> pixelformat=BA81
>>>>>>>>> ÂÂÂÂÂÂ v4l2-ctl -z platform:vimc -d "Raw Capture 1" -v
>>>>>>>>> pixelformat=BA81
>>>>>>>>> ÂÂÂÂÂÂ ÂÂÂÂ v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video1
>>>>>>>>> ÂÂÂÂÂÂ v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video2
>>>>>>>>> ÂÂÂÂÂÂ v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video3
>>>>>>>>>
>>>>>>>>> The third patch in the series fixes a general protection fault found
>>>>>>>>> when rmmod is done while stream is active.
>>>>>>>>
>>>>>>>> I applied your patch on top of media_tree/master and I did some
>>>>>>>> testing.
>>>>>>>> Not sure if I did something wrong, but just adding and removing the
>>>>>>>> module generated a kernel panic:
>>>>>>>
>>>>>>> Thanks for testing.
>>>>>>>
>>>>>>> Odd. I tested modprobe and rmmod both.I was working on Linux 5.3-rc2.
>>>>>>> I will apply these to media latest and work from there. I have to
>>>>>>> rebase these on top of the reverts from Lucas and Helen
>>>>>>
>>>>>> Ok, please let me know if I succeeded to reproduce.
>>>>>>
>>>>>>>>
>>>>>>>> ~# modprobe vimc
>>>>>>>> ~# rmmod vimc
>>>>>>>> [ÂÂ 16.452974] stack segment: 0000 [#1] SMP PTI
>>>>>>>> [ÂÂ 16.453688] CPU: 0 PID: 2038 Comm: rmmod Not tainted 5.3.0-rc2+ #36
>>>>>>>> [ÂÂ 16.454678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>>>>> BIOS 1.12.0-20181126_142135-anatol 04/01/2014
>>>>>>>> [ÂÂ 16.456191] RIP: 0010:kfree+0x4d/0x240
>>>>>>>>
>>>>>>>> <registers values...>
>>>>>>>>
>>>>>>>> [ÂÂ 16.469188] Call Trace:
>>>>>>>> [ÂÂ 16.469666]Â vimc_remove+0x35/0x90 [vimc]
>>>>>>>> [ÂÂ 16.470436]Â platform_drv_remove+0x1f/0x40
>>>>>>>> [ÂÂ 16.471233]Â device_release_driver_internal+0xd3/0x1b0
>>>>>>>> [ÂÂ 16.472184]Â driver_detach+0x37/0x6b
>>>>>>>> [ÂÂ 16.472882]Â bus_remove_driver+0x50/0xc1
>>>>>>>> [ÂÂ 16.473569]Â vimc_exit+0xc/0xca0 [vimc]
>>>>>>>> [ÂÂ 16.474231]Â __x64_sys_delete_module+0x18d/0x240
>>>>>>>> [ÂÂ 16.475036]Â do_syscall_64+0x43/0x110
>>>>>>>> [ÂÂ 16.475656]Â entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>>> [ÂÂ 16.476504] RIP: 0033:0x7fceb8dafa4b
>>>>>>>>
>>>>>>>> <registers values...>
>>>>>>>>
>>>>>>>> [ÂÂ 16.484853] Modules linked in: vimc(-) videobuf2_vmalloc
>>>>>>>> videobuf2_memops v4l2_tpg videobuf2_v4l2 videobuf2_common
>>>>>>>> [ÂÂ 16.486187] ---[ end trace 91e5e0894e254d49 ]---
>>>>>>>> [ÂÂ 16.486758] RIP: 0010:kfree+0x4d/0x240
>>>>>>>>
>>>>>>>> <registers values...>
>>>>>>>>
>>>>>>>> fish: ârmmod vimcâ terminated by signal SIGSEGV (Address boundary
>>>>>>>> error)
>>>>>>>>
>>>>>>>> I just added the module after booting, no other action was made.
>>>>>>>> Here is
>>>>>>>> how my `git log --oneline` looks like:
>>>>>>>>
>>>>>>>> 897d708e922b media: vimc: Fix gpf in rmmod path when stream is active
>>>>>>>> 2e4a5ad8ad6d media: vimc: Collapse component structure into a single
>>>>>>>> monolithic driver
>>>>>>>> 7c8da1687e92 media: vimc: move private defines to a common header
>>>>>>>> 97299a303532 media: Remove dev_err() usage after platform_get_irq()
>>>>>>>> 25a3d6bac6b9 media: adv7511/cobalt: rename driver name to adv7511-v4l2
>>>>>
>>>>> I couldn't reproduce the error, my tree looks the same:
>>>>>
>>>>> [I] koike@floko ~/m/o/linux> git log --oneline
>>>>> e3345155c8ed (HEAD) media: vimc: Fix gpf in rmmod path when stream is
>>>>> active
>>>>> 43e9e2fe761f media: vimc: Collapse component structure into a single
>>>>> monolithic driver
>>>>> 8a6d0b9adde0 media: vimc: move private defines to a common header
>>>>> 97299a303532 (media/master) media: Remove dev_err() usage after
>>>>> platform_get_irq()
>>>>> 25a3d6bac6b9 media: adv7511/cobalt: rename driver name to adv7511-v4l2
>>>>
>>>> Thanks Helen for trying to reproduce and sharing the result.
>>>
>>> Me and Helen found out what is the problem. If you follow this call trace:
>>>
>>> vimc_ent_sd_unregister()
>>> v4l2_device_unregister_subdev()
>>> v4l2_subdev_release()
>>>
>>> You'll notice that this last function calls the `release` callback
>>> implementation of the subdevice. For instance, the `release` of
>>> vimc-sensor is this one:
>>>
>>> static void vimc_sen_release(struct v4l2_subdev *sd)
>>> {
>>> ÂÂÂÂstruct vimc_sen_device *vsen =
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ container_of(sd, struct vimc_sen_device, sd);
>>>
>>> ÂÂÂÂv4l2_ctrl_handler_free(&vsen->hdl);
>>> ÂÂÂÂtpg_free(&vsen->tpg);
>>> ÂÂÂÂkfree(vsen);
>>> }
>>>
>>> And then you can see that `vsen` has been freed. Back to
>>> vimc_ent_sd_unregister(), after v4l2_device_unregister_subdev(), the
>>> function will call vimc_pads_cleanup(). This is basically a
>>> kfree(ved->pads), but `ved` has just been freed at
>>> v4l2_subdev_release(), producing a memory fault.
>>>
>>> To fix that, we found two options:
>>>
>>> - place the kfree(ved->pads) inside the release callback of each
>>> subdevice and removing vimc_pads_cleanup() from
>>> vimc_ent_sd_unregister()
>>> - use a auxiliary variable to hold the address of the pads, for instance:
>>>
>>> void vimc_ent_sd_unregister(...)
>>> {
>>> ÂÂÂÂ struct media_pad *pads = ved->pads;
>>> ÂÂÂÂ ...
>>> ÂÂÂÂ vimc_pads_cleanup(pads);
>>> }
>>>
>>>
>>
>> I fixed a problem in the thirds patch. vimc-capture uses the first
>> approach - placing the kfree(ved->pads) inside the release callback.
>>
>> I am debugging another such problem in unbind path while streaming.
>> I am working on v2 and I will look for the rmmod problem and fix it.
>>
>> thanks again for testing and finding the root cause.
>> -- Shuah
>
> Hi Andre,
>
> Here is what's happening.
>
> Before this change, you can't really do rmmod vimc, because vimc is in
> use by other component drivers. With the collapse, now you can actually
> do rmmod on vimc and this problem in vimc_ent_sd_unregister() that frees
> pads first and the does v4l2_device_unregister_subdev().
>
> I fixed this in the 3/3 patch. I can reproduce the problem with patches 1 and 2, and patch 3 fixes it.
>
> Did you test with the third patch in this series?

yes, we tested with 3/3, but the new problem now is when doing the following
in this order:

v4l2_device_unregister_subdev(sd);
vimc_pads_cleanup(ved->pads);


v4l2_device_unregister_subdev() calls the release function of the subdevice that
frees the ved object, so ved->pads is not valid anymore. That is why Andrà suggested
a temporary variable to hold ved->pads and to be able to free it later:

struct media_pad *pads = ved->pads;

v4l2_device_unregister_subdev(sd);
vimc_pads_cleanup(pads); // So we don't use the ved object here anymore.


Regards,
Helen

>
> thanks,
> -- Shuah