Re: FYI: Userland breakage caused by udev bind commit

From: Dmitry Torokhov
Date: Mon Dec 24 2018 - 12:34:47 EST


On Mon, Dec 24, 2018 at 11:54:07AM +0100, Greg KH wrote:
> On Mon, Dec 24, 2018 at 11:15:34AM +0100, Gabriel C wrote:
> > Am Mo., 24. Dez. 2018 um 10:17 Uhr schrieb Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>:
> > >
> > > On Mon, Dec 24, 2018 at 08:31:27AM +0100, Gabriel C wrote:
> > > > Am So., 23. Dez. 2018 um 19:09 Uhr schrieb Dmitry Torokhov
> > > > <dmitry.torokhov@xxxxxxxxx>:
> > > >
> > > > [ also added Linus to CC on that one too ]
> > > > >
> > > > > On Sun, Dec 23, 2018 at 06:17:04PM +0100, Christian Brauner wrote:
> > > > > > On Sun, Dec 23, 2018 at 05:49:54PM +0100, Marcus Meissner wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am the maintainer of libmtp and libgphoto2
> > > > > > >
> > > > > > > Some months ago I was made aware of this bug:
> > > > > > > https://bugs.kde.org/show_bug.cgi?id=387454
> > > > > > >
> > > > > > > This was fallout identified to come from this kernel commit:
> > > > > > >
> > > > > > > commit 1455cf8dbfd06aa7651dcfccbadb7a093944ca65
> > > > > > > Author: Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>
> > > > > > > Date: Wed Jul 19 17:24:30 2017 -0700
> > > > > >
> > > > > > Fwiw, the addition of {un}bind events has caused issues for
> > > > > > systemd-udevd as well and is tracked here:
> > > > > > https://github.com/systemd/systemd/issues/7587
> > > > > > I haven't been aware of this until yesterday and it seems that so far
> > > > > > this hasn't been brought up on lkml until you did now.
> > > > >
> > > > > The fallout was caused by premature enabling of the new events in
> > > > > systemd/udev by yours truly (even though the commit has Lennart's name
> > > > > on it due to how it was merged):
> > > > >
> > > > > https://github.com/systemd/systemd/commit/9a39e1ce314d1a6f8a754f6dab040019239666a9
> > > > >
> > > > > "Add handling for bind/unbind actions (#6720)
> > > > >
> > > > > Newer kernels will emit uevents with "bind" and "unbind" actions. These
> > > > > uevents will be issued when driver is bound to or unbound from a device.
> > > > > "Bind" events are helpful when device requires a firmware to operate
> > > > > properly, and driver is unable to create a child device before firmware
> > > > > is properly loaded.
> > > > >
> > > > > For some reason systemd validates actions and drops the ones it does not
> > > > > know, instead of passing them on through as old udev did, so we need to
> > > > > explicitly teach it about them."
> > > > >
> > > > > Similarly it is now papered over in systemd/udev until we make it
> > > > > properly handle new events:
> > > > >
> > > > > https://github.com/systemd/systemd/commit/56c886dc7ed5b2bb0882ba85136f4070545bfc1b
> > > > >
> > > > > "sd-device: ignore bind/unbind events for now
> > > > >
> > > > > Until systemd/udev are ready for the new events and do not flush entire
> > > > > device state on each new event received, we should ignore them."
> > > > >
> > > >
> > > > And how about peoples still uses systemd < 235 and newer kernels ?
> > >
> > > Is that an issue? Who uses that, and does it cause problems on their
> > > systems given that the events just do not do anything for those systems?
> > >
> > > We tested this out a lot back in the summer of 2017 and I thought all
> > > was well. What recently changed that caused breakages to suddenly show
> > > up? How have we not seen this until now?
> > >
> >
> > Well people observed that , please click the bug link for that KDE bug.
> > Reported '2017-11-30'..
> >
> > I can reproduce that on systemd 231 ( which we have here ) and
> > kernels >= 4.14 just easy.
> >
> > Can't use any mtp devices all dropping :
> >
> > The file or folder udi=/org/kde/solid/udev/....... does not exists'
> >
> > Why it got not reported here is probably because people are shy to
> > report such things to LKML.
> >
> > > We can drop the "new" uevents now by reverting the patch, but what about
> > > the userspace tools that now depend on them as we have had them in our
> > > kernels for so long? We can't now break them, right? Should we add a
> > > new kernel config option to not emit those for older userspaces that can
> > > not handle this (of which I really still do not understand given that we
> > > tested the heck out of this last year...)
> >
> > Peoples started to add workarounds to make it work somewhat again.
> >
> > Greg any such changes to udev are very fragile.
>
> I am not changing udev. Well, Dmitry changed udev, and then reverted
> it, so all should be fine :)
>
> > Also dropping some patch to systemd-udev won't solve anything on such moves.
>
> If systemd-udev was broken, it should resolve the issue, right?
>
> > Remember there exists other udev impelmentations too and not only that.
>
> Ok, what other udev implementations are broken and why have we not heard
> from them in the past 1 1/2 years?
>
> > See example below :
> >
> > app1- xxx - depending on some udev / kernel behaviour ( add rule in this case )
> > kernel - xxx changes that ( adding bind which confuses add to usersapce )
>
> No, another random uevent should never confuse userspace as userspace
> always had to properly handle any uevent it got, no matter what it was
> called. Why would userspace get confused?
>
> > - on update to that kernel app1 breaks..
> > - udevd - drops an patch in to catch up
> > - app1 trying to workaround now both ( which is that case here )
> > and now here the mess starts.
>
> What application is working around what exactly? Specific patches would
> be good to point to.
>
> > Having app1-fixed for kernel who changed behaviour and using now
> > and kernel does not have this makes app1 breaks again
> >
> > Using fixed udev and app1 without workarounds on kernel with bind breaks,
> > using not fixed udev , app1 without workround breaks etc..
> >
> > >
> > > still confused,
> > >
> >
> > The problem I see here is 'bind' confuses 'add'.
> >
> > So is there a way to make bind event _not_ confusing add event ?
>
> A bind event should not confuse any other events at all, it is as if
> adding any other type of uevent would also confuse an add event?
>
> Something is really wrong if that were to happen why is udev thinking
> 'bind' is the same as 'add'? Is it also thinking that 'unbind' is the
> same as 'add'?

The issue is a combination of factors:

1. systemd/udev/eudev flushing state for a device for each new uevent,
so receiving "bind" or any new uevent that we might create in the
future drops everything that was added by rules for "add"

2. Most rules having stanza ACTION!="add|change" GOTO="end" which
is actually proper expression, but has unfortunate effect of not
re-adding properties that were dropped.

Some package maintainers started changing this to ACTION=="remove",
ACTION!="bind", etc, but I think this actually is nit good long-term
strategy.

>
> And see Dmitry's email, it seems that all of the combinations are now
> handled properly.

I was talking about systemd only, but I guess we do have eudev...

>
> If not, how to resolve this?

Well, it appears that we can no longer extend uevent interface with new
types of uevents, at least not until we go and fix up all
udev-derivatives and give some time for things to settle. I am not sure
if abusing change to signal bind/unbind, as was suggested by Lennart on
one of the threads is such a great idea and it is not really extensible.

I guess reverting is the right solution here. I wish folks would yell
earlier though...

Thanks.

--
Dmitry