Re: [PATCH v3 8/8] mfd: cros_ec: add a dev_release empty method.

From: Guenter Roeck
Date: Thu Nov 29 2018 - 17:29:02 EST


On Thu, Nov 29, 2018 at 2:11 PM Enric Balletbo i Serra
<enric.balletbo@xxxxxxxxxxxxx> wrote:
>
> Hi,
>
> On 29/11/18 8:55, Greg Kroah-Hartman wrote:
> > On Wed, Nov 28, 2018 at 05:17:22PM -0800, Guenter Roeck wrote:
> >> Hi Greg,
> >>
> >> On Tue, Nov 27, 2018 at 9:52 AM Greg Kroah-Hartman
> >> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >>>
> >>> On Tue, Nov 27, 2018 at 09:29:38AM -0800, Guenter Roeck wrote:
> >>>> Hi Enric,
> >>>>
> >>>> On Tue, Nov 27, 2018 at 4:19 AM Enric Balletbo i Serra
> >>>> <enric.balletbo@xxxxxxxxxxxxx> wrote:
> >>>>>
> >>>>> Devices are required to provide a release method. This patch fixes the
> >>>>> following WARN():
> >>>>>
> >>>>> [ 47.218707] ------------[ cut here ]------------
> >>>>> [ 47.223901] Device 'cros_ec' does not have a release() function, it is broken and must be fixed.
> >>>>> [ 47.234430] WARNING: CPU: 0 PID: 3585 at drivers/base/core.c:895 device_release+0x80/0x90
> >>>>> [ 47.243560] Modules linked in: btusb btrtl btintel btbcm bluetooth ecdh_generic [...]
> >>>>> [ 47.323851] CPU: 0 PID: 3585 Comm: rmmod Not tainted 4.20.0-rc2+ #29
> >>>>> [ 47.330947] Hardware name: Google Kevin (DT)
> >>>>> [ 47.335714] pstate: 40000005 (nZcv daif -PAN -UAO)
> >>>>> [ 47.341063] pc : device_release+0x80/0x90
> >>>>> [ 47.345537] lr : device_release+0x80/0x90
> >>>>> [ 47.350001] sp : ffff00000b17bc70
> >>>>> [ 47.353698] x29: ffff00000b17bc70 x28: ffff8000e48e9a80
> >>>>> [ 47.359629] x27: 0000000000000000 x26: 0000000000000000
> >>>>> [ 47.365561] x25: 0000000056000000 x24: 0000000000000015
> >>>>> [ 47.371492] x23: ffff8000f0248060 x22: ffff000000b700a0
> >>>>> [ 47.377414] x21: ffff8000edf56100 x20: ffff8000edd13028
> >>>>> [ 47.383346] x19: ffff8000edd13018 x18: 0000000000000095
> >>>>> [ 47.389278] x17: 0000000000000000 x16: 0000000000000000
> >>>>> [ 47.395209] x15: 0000000000000400 x14: 0000000000000400
> >>>>> [ 47.401131] x13: 00000000000001a7 x12: 0000000000000000
> >>>>> [ 47.407053] x11: 0000000000000001 x10: 0000000000000960
> >>>>> [ 47.412976] x9 : ffff00000b17b9b0 x8 : ffff8000e48ea440
> >>>>> [ 47.418898] x7 : ffff8000ee9090c0 x6 : ffff8000f7d0b0b8
> >>>>> [ 47.424830] x5 : ffff8000f7d0b0b8 x4 : 0000000000000000
> >>>>> [ 47.430752] x3 : ffff8000f7d11e68 x2 : ffff8000e48e9a80
> >>>>> [ 47.436674] x1 : 37d859939c964800 x0 : 0000000000000000
> >>>>> [ 47.442597] Call trace:
> >>>>> [ 47.445324] device_release+0x80/0x90
> >>>>> [ 47.449414] kobject_put+0x74/0xe8
> >>>>> [ 47.453210] device_unregister+0x20/0x30
> >>>>> [ 47.457592] ec_device_remove+0x34/0x48 [cros_ec_dev]
> >>>>> [ 47.463233] platform_drv_remove+0x28/0x48
> >>>>> [ 47.467805] device_release_driver_internal+0x1a8/0x240
> >>>>> [ 47.473630] driver_detach+0x40/0x80
> >>>>> [ 47.477609] bus_remove_driver+0x54/0xa8
> >>>>> [ 47.481986] driver_unregister+0x2c/0x58
> >>>>> [ 47.486355] platform_driver_unregister+0x10/0x18
> >>>>> [ 47.491599] cros_ec_dev_exit+0x1c/0x258 [cros_ec_dev]
> >>>>> [ 47.497338] __arm64_sys_delete_module+0x16c/0x1f8
> >>>>> [ 47.502689] el0_svc_common+0x84/0xd8
> >>>>> [ 47.506776] el0_svc_handler+0x2c/0x80
> >>>>> [ 47.510960] el0_svc+0x8/0xc
> >>>>> [ 47.514171] ---[ end trace 9087279fc8c03450 ]---
> >>>>>
> >>>>> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@xxxxxxxxxxxxx>
> >>>>> ---
> >>>>>
> >>>>> Changes in v3: None
> >>>>> Changes in v2:
> >>>>> - Fix WARN when unloading. This is new in these series.
> >>>>>
> >>>>> drivers/mfd/cros_ec_dev.c | 5 +++++
> >>>>> 1 file changed, 5 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/mfd/cros_ec_dev.c b/drivers/mfd/cros_ec_dev.c
> >>>>> index 1ba98a32715e..cdb941c6db98 100644
> >>>>> --- a/drivers/mfd/cros_ec_dev.c
> >>>>> +++ b/drivers/mfd/cros_ec_dev.c
> >>>>> @@ -35,9 +35,14 @@
> >>>>> #define CROS_MAX_DEV 128
> >>>>> static int ec_major;
> >>>>>
> >>>>> +static void cros_ec_dev_release(struct device *dev)
> >>>>> +{
> >>>>> +}
> >>>
> >>> Yeah, as part of the in-kernel documentation, I now get to make fun of
> >>> you in public!
> >>>
> >>> You did read the documentation, right?
> >>>
> >>
> >> To be fair, the problem is difficult to understand. Maybe it is easy
> >> for you, but that is not true for everyone, including me. Remember the
> >> block discussion we just had ? As for the in-kernel documentation,
> >> maybe there is a comprehensive explanation someone, one that clueless
> >> people like me can understand, but all I found was
> >>
> >> "If a bus driver unregisters a device, it should not immediately free
> >> it. It should instead wait for the driver model core to call the
> >> device's release method, then free the bus-specific object.
> >> (There may be other code that is currently referencing the device
> >> structure, and it would be rude to free the device while that is
> >> happening)"
> >>
> >> Does that apply to mfd devices ? What other code may that be that
> >> accesses the structure ? What else does it mean, or in other words,
> >> what other cleanup code besides releasing the data structure needs to
> >> reside in the release function ?
> >
>
> I think that this can be one of those cases where using device managed
> allocations is not right. If so we only need to revert commit
>
> 3aa2177e4787 ("mfd: cros_ec: Use devm_kzalloc for private data")
>

Hmm, yes, that patch looks problematic.

> I think that the problem might be a dereference when a file operation call
> happens if accesses to the device but the struct is already freed, so the
> allocated structure should be freed after the last release call because you
> can't guarantee is _not_ used before that. In this case class_dev is embedded to
> the struct so I guess that the only resource we need to free is the cros_ec
> device struct. I can be wrong, I didn't continue the research.
>
> This is what Guenter make me think when he said "object lifetime", then I read
> the Greg's answer. I felt bad and I just abandoned that task and switch to
> another one. There were still open questions in my mind but I was not so
> motivated to solve it.
>
> Before send the patch I looked at the code and I saw that there are different
> places where an "empty" release function is used. If this is never allowed,
> maybe we can create a cocci script to catch these cases, I started this script
> (thanks Peter for helping me). Only detects two places, but the script is not
> complete as should also take in consideration when the release function is
> assigned in a function (usually people does this) instead of assigning the
> function directly in the struct. I'll be happy to help on this if people think
> will be useful.

I think it would be useful. It should also detect empty device release
functions, such as the one you tried to introduce here.

Thanks,
Guenter

>
>
> @r1@
> identifier I, s, func;
> @@
> struct I s = { ..., .dev_release = func, ...};
>
> @r2@
> identifier r1.func;
> position p1;
> @@
> func@p1(...){}
>
> @script:python@
> fn << r1.func;
> p1 << r2.p1;
> @@
>
> print ("%s:%s empty release function at lines %s" % (p1[0].file,fn,p1[0].line))
>
> Thanks
> Enric
>
> > To quote Documentation/kobject.txt:
> > One important point cannot be overstated: every kobject must
> > have a release() method, and the kobject must persist (in a
> > consistent state) until that method is called. If these
> > constraints are not met, the code is flawed. Note that the
> > kernel will warn you if you forget to provide a release()
> > method. Do not try to get rid of this warning by providing an
> > "empty" release function; you will be mocked mercilessly by the
> > kobject maintainer if you attempt this.
> >
> > The fact that you couldn't even find this means that it probably is in
> > the wrong place, but then, where is the "right" place for where everyone
> > can see it? Should I refer to this file in the kernel error message?
> >
> > That file also should answer your other questions about lifetime rules
> > of kobjects, which is really the same thing as 'struct device' here. If
> > not, please let me know and I can fix it up.
> >
> > thanks,
> >
> > greg k-h
> >