Re: [bisected] Re: Module removal-related regression?

From: Dmitry Torokhov
Date: Sun Sep 10 2017 - 17:22:30 EST


On Sun, Sep 10, 2017 at 12:13 PM, Jakub Kicinski <kubakici@xxxxx> wrote:
> On Sun, 10 Sep 2017 21:09:08 +0200, Greg Kroah-Hartman wrote:
>> On Sun, Sep 10, 2017 at 11:12:17AM -0700, Dmitry Torokhov wrote:
>> > On September 10, 2017 11:00:10 AM PDT, Jakub Kicinski <kubakici@xxxxx> wrote:
>> > >On Sun, 10 Sep 2017 09:21:11 -0700, Dmitry Torokhov wrote:
>> > >> On Sun, Sep 10, 2017 at 12:03:38AM +0200, Jakub Kicinski wrote:
>> > >> > On Sat, 09 Sep 2017 13:59:25 -0700, Dmitry Torokhov wrote:
>> > >> > > On September 9, 2017 1:17:26 PM PDT, Jakub Kicinski
>> > ><kubakici@xxxxx> wrote:
>> > >> > > >On Sat, 9 Sep 2017 12:55:51 -0700, Dmitry Torokhov wrote:
>> > >> > > >> On Sat, Sep 9, 2017 at 12:27 PM, Jakub Kicinski
>> > ><kubakici@xxxxx>
>> > >> > > >wrote:
>> > >> > > >> > On Sat, 9 Sep 2017 19:41:21 +0200, Jakub Kicinski wrote:
>> > >
>> > >> > > >> >> Hi!
>> > >> > > >> >>
>> > >> > > >> >> I'm having trouble with modules on linux/master. rmmod
>> > >succeeds
>> > >> > > >but the
>> > >> > > >> >> module is still loaded and the refcount goes to 1:
>> > >> > > >> >>
>> > >> > > >> >> #rmmod nfp; insmod ./src/nfp.ko nfp_pf_netdev=0 ; \
>> > >> > > >> >> /opt/netronome/bin/nfp-hwinfo -n 2 assembly.partno \
>> > >> > > >> >> lsmod | grep nfp; \
>> > >> > > >> >> rmmod nfp; \
>> > >> > > >> >> lsmod | grep nfp
>> > >> > > >> >> nfp 249856 0
>> > >> > > >> >> nfp 200704 1
>> > >> > > >> >>
>> > >> > > >> >> If I rmmod again the module will be actually unloaded. The
>> > >user
>> > >> > > >space
>> > >> > > >> >> is mostly Ubuntu 14.04. Has anyone seen this? I'm trying
>> > >to
>> > >> > > >bisect
>> > >> > > >> >> now...
>> > >> > > >> >
>> > >> > > >> > Got 'em!
>> > >> > > >> >
>> > >> > > >> > commit 1455cf8dbfd06aa7651dcfccbadb7a093944ca65 (HEAD,
>> > >> > > >refs/bisect/bad)
>> > >> > > >> > Author: Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>
>> > >> > > >> > Date: Wed Jul 19 17:24:30 2017 -0700
>> > >> > > >> >
>> > >> > > >> > driver core: emit uevents when device is bound to a
>> > >driver
>> > >> > > >>
>> > >> > > >> Does it happen with all modules or only nfp one?
>> > >> > > >>
>> > >> > > >> It seems to work here:
>> > >> > > >>
>> > >> > > >> dtor@dtor-glaptop3:~ $ lsmod | grep psmouse
>> > >> > > >> psmouse 135168 0
>> > >> > > >> dtor@dtor-glaptop3:~ $ sudo rmmod psmouse
>> > >> > > >> dtor@dtor-glaptop3:~ $ lsmod | grep psmouse
>> > >> > > >> dtor@dtor-glaptop3:~ $ sudo modprobe psmouse
>> > >> > > >
>> > >> > > >It looks like the driver is actually reloaded. The driver used
>> > >to
>> > >> > > >return EPROBE_DEFER, but I think it doesn't any more (rebuilding
>> > >the
>> > >> > > >kernel to test that right now).
>> > >> > > >
>> > >> > > >Could the uevent on unbind tickle Ubuntu 14.04's udev or somehow
>> > >> > > >else cause the driver to be loaded again?
>> > >> > >
>> > >> > > It depends on how silly the udev rules are, but yes, this can
>> > >definitely happen.
>> > >> >
>> > >> > I confirmed the driver doesn't use EPROBE_DEFER any more:
>> > >> >
>> > >> > $ grep -nrI EPROBE_DEFER drivers/net/ethernet/netronome/
>> > >> > $
>> > >>
>> > >> Not sure why you bring the deferrals here, they have nothing to do
>> > >with
>> > >> module removal. Also, deferrals are rarely issued by the leaf driver,
>> > >and
>> > >> more often by providers of resources (GPIO, regulator, interrupt,
>> > >etc).
>> > >
>> > >Yes, it's unusual, but this driver used to do it. Which is exactly why
>> > >I brought it up. Turns out it was irrelevant :)
>> > >
>> > >> > I tested without any udev rules in /etc/udev/, just the standard
>> > >distro
>> > >> > ones. Same thing.
>> > >>
>> > >> Right, so this is the default udev rule:
>> > >>
>> > >> /lib/udev/rules.d/80-drivers.rules:
>> > >>
>> > >> # do not edit this file, it will be overwritten on update
>> > >>
>> > >> ACTION=="remove", GOTO="drivers_end"
>> > >>
>> > >> ENV{MODALIAS}=="?*", RUN{builtin}="kmod load $env{MODALIAS}"
>>
>> So if the new uevents do not have the MODALIAS line in them, then they
>> will not trigger this? Dmitry, can you see if that would fix this
>> problem without having to fix everyone's old versions of udev/systemd?

Unfortunately MODALIAS= is being added by individual subsystems having
their subsystem specific format. Unless you'd be OK with
kobject_uevent_env() poking into the generated environment and zapping
MODALIAS= environment variables for KOBJ_BIND/KOBJ_UNBIND actions.

Let me know and I can try to come up with a patch.

I'm still going to submit correction for the rule to systemd folks.

>
> Perhaps another option is dropping the unbind event? From the commit
> message it seems like only bind is really needed ATM. Do events have
> to be symmetrical?

While you are absolutely right that bind is the most important one,
I'd be hesitant removing unbind even though we do not have concrete
use case for it yet. The bind operation complements unbind, so having
bind uevent but not unbind "feels weird".

Thanks.

--
Dmitry