Re: [PATCH v3 8/8] mfd: cros_ec: add a dev_release empty method.

From: Enric Balletbo i Serra
Date: Thu Nov 29 2018 - 17:11:24 EST


Hi,

On 29/11/18 8:55, Greg Kroah-Hartman wrote:
> On Wed, Nov 28, 2018 at 05:17:22PM -0800, Guenter Roeck wrote:
>> Hi Greg,
>>
>> On Tue, Nov 27, 2018 at 9:52 AM Greg Kroah-Hartman
>> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> On Tue, Nov 27, 2018 at 09:29:38AM -0800, Guenter Roeck wrote:
>>>> Hi Enric,
>>>>
>>>> On Tue, Nov 27, 2018 at 4:19 AM Enric Balletbo i Serra
>>>> <enric.balletbo@xxxxxxxxxxxxx> wrote:
>>>>>
>>>>> Devices are required to provide a release method. This patch fixes the
>>>>> following WARN():
>>>>>
>>>>> [ 47.218707] ------------[ cut here ]------------
>>>>> [ 47.223901] Device 'cros_ec' does not have a release() function, it is broken and must be fixed.
>>>>> [ 47.234430] WARNING: CPU: 0 PID: 3585 at drivers/base/core.c:895 device_release+0x80/0x90
>>>>> [ 47.243560] Modules linked in: btusb btrtl btintel btbcm bluetooth ecdh_generic [...]
>>>>> [ 47.323851] CPU: 0 PID: 3585 Comm: rmmod Not tainted 4.20.0-rc2+ #29
>>>>> [ 47.330947] Hardware name: Google Kevin (DT)
>>>>> [ 47.335714] pstate: 40000005 (nZcv daif -PAN -UAO)
>>>>> [ 47.341063] pc : device_release+0x80/0x90
>>>>> [ 47.345537] lr : device_release+0x80/0x90
>>>>> [ 47.350001] sp : ffff00000b17bc70
>>>>> [ 47.353698] x29: ffff00000b17bc70 x28: ffff8000e48e9a80
>>>>> [ 47.359629] x27: 0000000000000000 x26: 0000000000000000
>>>>> [ 47.365561] x25: 0000000056000000 x24: 0000000000000015
>>>>> [ 47.371492] x23: ffff8000f0248060 x22: ffff000000b700a0
>>>>> [ 47.377414] x21: ffff8000edf56100 x20: ffff8000edd13028
>>>>> [ 47.383346] x19: ffff8000edd13018 x18: 0000000000000095
>>>>> [ 47.389278] x17: 0000000000000000 x16: 0000000000000000
>>>>> [ 47.395209] x15: 0000000000000400 x14: 0000000000000400
>>>>> [ 47.401131] x13: 00000000000001a7 x12: 0000000000000000
>>>>> [ 47.407053] x11: 0000000000000001 x10: 0000000000000960
>>>>> [ 47.412976] x9 : ffff00000b17b9b0 x8 : ffff8000e48ea440
>>>>> [ 47.418898] x7 : ffff8000ee9090c0 x6 : ffff8000f7d0b0b8
>>>>> [ 47.424830] x5 : ffff8000f7d0b0b8 x4 : 0000000000000000
>>>>> [ 47.430752] x3 : ffff8000f7d11e68 x2 : ffff8000e48e9a80
>>>>> [ 47.436674] x1 : 37d859939c964800 x0 : 0000000000000000
>>>>> [ 47.442597] Call trace:
>>>>> [ 47.445324] device_release+0x80/0x90
>>>>> [ 47.449414] kobject_put+0x74/0xe8
>>>>> [ 47.453210] device_unregister+0x20/0x30
>>>>> [ 47.457592] ec_device_remove+0x34/0x48 [cros_ec_dev]
>>>>> [ 47.463233] platform_drv_remove+0x28/0x48
>>>>> [ 47.467805] device_release_driver_internal+0x1a8/0x240
>>>>> [ 47.473630] driver_detach+0x40/0x80
>>>>> [ 47.477609] bus_remove_driver+0x54/0xa8
>>>>> [ 47.481986] driver_unregister+0x2c/0x58
>>>>> [ 47.486355] platform_driver_unregister+0x10/0x18
>>>>> [ 47.491599] cros_ec_dev_exit+0x1c/0x258 [cros_ec_dev]
>>>>> [ 47.497338] __arm64_sys_delete_module+0x16c/0x1f8
>>>>> [ 47.502689] el0_svc_common+0x84/0xd8
>>>>> [ 47.506776] el0_svc_handler+0x2c/0x80
>>>>> [ 47.510960] el0_svc+0x8/0xc
>>>>> [ 47.514171] ---[ end trace 9087279fc8c03450 ]---
>>>>>
>>>>> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@xxxxxxxxxxxxx>
>>>>> ---
>>>>>
>>>>> Changes in v3: None
>>>>> Changes in v2:
>>>>> - Fix WARN when unloading. This is new in these series.
>>>>>
>>>>> drivers/mfd/cros_ec_dev.c | 5 +++++
>>>>> 1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/drivers/mfd/cros_ec_dev.c b/drivers/mfd/cros_ec_dev.c
>>>>> index 1ba98a32715e..cdb941c6db98 100644
>>>>> --- a/drivers/mfd/cros_ec_dev.c
>>>>> +++ b/drivers/mfd/cros_ec_dev.c
>>>>> @@ -35,9 +35,14 @@
>>>>> #define CROS_MAX_DEV 128
>>>>> static int ec_major;
>>>>>
>>>>> +static void cros_ec_dev_release(struct device *dev)
>>>>> +{
>>>>> +}
>>>
>>> Yeah, as part of the in-kernel documentation, I now get to make fun of
>>> you in public!
>>>
>>> You did read the documentation, right?
>>>
>>
>> To be fair, the problem is difficult to understand. Maybe it is easy
>> for you, but that is not true for everyone, including me. Remember the
>> block discussion we just had ? As for the in-kernel documentation,
>> maybe there is a comprehensive explanation someone, one that clueless
>> people like me can understand, but all I found was
>>
>> "If a bus driver unregisters a device, it should not immediately free
>> it. It should instead wait for the driver model core to call the
>> device's release method, then free the bus-specific object.
>> (There may be other code that is currently referencing the device
>> structure, and it would be rude to free the device while that is
>> happening)"
>>
>> Does that apply to mfd devices ? What other code may that be that
>> accesses the structure ? What else does it mean, or in other words,
>> what other cleanup code besides releasing the data structure needs to
>> reside in the release function ?
>

I think that this can be one of those cases where using device managed
allocations is not right. If so we only need to revert commit

3aa2177e4787 ("mfd: cros_ec: Use devm_kzalloc for private data")

I think that the problem might be a dereference when a file operation call
happens if accesses to the device but the struct is already freed, so the
allocated structure should be freed after the last release call because you
can't guarantee is _not_ used before that. In this case class_dev is embedded to
the struct so I guess that the only resource we need to free is the cros_ec
device struct. I can be wrong, I didn't continue the research.

This is what Guenter make me think when he said "object lifetime", then I read
the Greg's answer. I felt bad and I just abandoned that task and switch to
another one. There were still open questions in my mind but I was not so
motivated to solve it.

Before send the patch I looked at the code and I saw that there are different
places where an "empty" release function is used. If this is never allowed,
maybe we can create a cocci script to catch these cases, I started this script
(thanks Peter for helping me). Only detects two places, but the script is not
complete as should also take in consideration when the release function is
assigned in a function (usually people does this) instead of assigning the
function directly in the struct. I'll be happy to help on this if people think
will be useful.


@r1@
identifier I, s, func;
@@
struct I s = { ..., .dev_release = func, ...};

@r2@
identifier r1.func;
position p1;
@@
func@p1(...){}

@script:python@
fn << r1.func;
p1 << r2.p1;
@@

print ("%s:%s empty release function at lines %s" % (p1[0].file,fn,p1[0].line))

Thanks
Enric

> To quote Documentation/kobject.txt:
> One important point cannot be overstated: every kobject must
> have a release() method, and the kobject must persist (in a
> consistent state) until that method is called. If these
> constraints are not met, the code is flawed. Note that the
> kernel will warn you if you forget to provide a release()
> method. Do not try to get rid of this warning by providing an
> "empty" release function; you will be mocked mercilessly by the
> kobject maintainer if you attempt this.
>
> The fact that you couldn't even find this means that it probably is in
> the wrong place, but then, where is the "right" place for where everyone
> can see it? Should I refer to this file in the kernel error message?
>
> That file also should answer your other questions about lifetime rules
> of kobjects, which is really the same thing as 'struct device' here. If
> not, please let me know and I can fix it up.
>
> thanks,
>
> greg k-h
>