Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents

From: Greg KH
Date: Tue Oct 17 2017 - 09:26:24 EST


On Tue, Oct 17, 2017 at 12:04:41PM +0200, Peter Rajnoha wrote:
> On 10/16/2017 04:49 PM, Greg KH wrote:
> > On Mon, Oct 16, 2017 at 04:00:26PM +0200, Peter Rajnoha wrote:
> >> On 10/16/2017 03:55 PM, Greg KH wrote:
> >>> On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote:
> >>>> Record last uevent seqnum that was used with kobject's uevent and send
> >>>> it with next uevent as PREV_SEQNUM=<num> variable.
> >>>>
> >>>> This enhances the way we are able to track and handle uevents in userspace
> >>>> if we are trying to catch up with all the uevents that were generated
> >>>> while we were not listening or the uevents which were not delivered.
> >>>> This may happen if the userspace listener is not running yet when the
> >>>> uevent is triggered or there's a period of time when it's not listening
> >>>> (e.g. because the userspace listener is being restarted while a new
> >>>> uevent fires).
> >>>>
> >>>> A userspace listener can compare the last uevent seqnum it knows about
> >>>> with the last uevent seqnum the kernel reports through uevents delivered
> >>>> to userspace, both genuine and synthetic ones (the ones generated by
> >>>> writing uevent action name to /sys/.../uevent file). Then, if needed,
> >>>> userspace may execute appropriate actions based on that. We don't need
> >>>> to reevaluate and recheck all items, instead, we can concentrate only
> >>>> on items for which we really and actually need to reflect changed state
> >>>> in kernel while the userspace listener was not listening for uevents.
> >>>>
> >>>> Signed-off-by: Peter Rajnoha <prajnoha@xxxxxxxxxx>
> >>>> ---
> >>>> include/linux/kobject.h | 1 +
> >>>> lib/kobject.c | 1 +
> >>>> lib/kobject_uevent.c | 15 +++++++++++++--
> >>>> 3 files changed, 15 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> >>>> index e0a6205caa71..ee6db28b6186 100644
> >>>> --- a/include/linux/kobject.h
> >>>> +++ b/include/linux/kobject.h
> >>>> @@ -73,6 +73,7 @@ struct kobject {
> >>>> #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
> >>>> struct delayed_work release;
> >>>> #endif
> >>>> + u64 last_uevent_seqnum;
> >>>> unsigned int state_initialized:1;
> >>>> unsigned int state_in_sysfs:1;
> >>>> unsigned int state_add_uevent_sent:1;
> >>>> diff --git a/lib/kobject.c b/lib/kobject.c
> >>>> index 763d70a18941..2cc809f10ae7 100644
> >>>> --- a/lib/kobject.c
> >>>> +++ b/lib/kobject.c
> >>>> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
> >>>> return;
> >>>> kref_init(&kobj->kref);
> >>>> INIT_LIST_HEAD(&kobj->entry);
> >>>> + kobj->last_uevent_seqnum = 0;
> >>>> kobj->state_in_sysfs = 0;
> >>>> kobj->state_add_uevent_sent = 0;
> >>>> kobj->state_remove_uevent_sent = 0;
> >>>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >>>> index f237a09a5862..f1cbed06a395 100644
> >>>> --- a/lib/kobject_uevent.c
> >>>> +++ b/lib/kobject_uevent.c
> >>>> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
> >>>> }
> >>>>
> >>>> mutex_lock(&uevent_sock_mutex);
> >>>> - /* we will send an event, so request a new sequence number */
> >>>> - retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
> >>>> + /*
> >>>> + * We will send an event, so request a new sequence number.
> >>>> + * Also, record the number for next uevent's PREV_SEQNUM.
> >>>> + */
> >>>> + kobj->last_uevent_seqnum = ++uevent_seqnum;
> >>>> + retval = add_uevent_var(env, "SEQNUM=%llu",
> >>>> + (unsigned long long)uevent_seqnum);
> >>>> + if (retval) {
> >>>> + mutext_unlock(&uevent_sock_mutex);
> >>>> + goto exit;
> >>>> + }
> >>>> + retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
> >>>> + (unsigned long long)kobj->last_uevent_seqnum);
> >>>
> >>> I'm confused, why are we storing "last_uevent_seqnum" here at all, if it
> >>> is only just the current one-1? What else do you do with that value?
> >> I'm sorry, I've sent incorrect patch by mistake. This is the correct one:
> >
> > Ok, this is a bit better (but it's not in a format I could apply it in,
> > if I wanted to...)
> >
> >> From 91fa0d0f5c52b959a5ca4cc79d1a4f5970db75e2 Mon Sep 17 00:00:00 2001
> >> From: Peter Rajnoha <prajnoha@xxxxxxxxxx>
> >> Date: Thu, 12 Oct 2017 11:33:48 +0200
> >> Subject: [PATCH] kobject: record and send PREV_SEQNUM with uevents
> >>
> >> Record last uevent seqnum that was used with kobject's uevent and send
> >> it with next uevent as PREV_SEQNUM=<num> variable.
> >>
> >> This enhances the way we are able to track and handle uevents in userspace
> >> if we are trying to catch up with all the uevents that were generated
> >> while we were not listening or the uevents which were not delivered.
> >> This may happen if the userspace listener is not running yet when the
> >> uevent is triggered or there's a period of time when it's not listening
> >> (e.g. because the userspace listener is being restarted while a new
> >> uevent fires).
> >>
> >> A userspace listener can compare the last uevent seqnum it knows about
> >> with the last uevent seqnum the kernel reports through uevents delivered
> >> to userspace, both genuine and synthetic ones (the ones generated by
> >> writing uevent action name to /sys/.../uevent file). Then, if needed,
> >> userspace may execute appropriate actions based on that. We don't need
> >> to reevaluate and recheck all items, instead, we can concentrate only
> >> on items for which we really and actually need to reflect changed state
> >> in kernel while the userspace listener was not listening for uevents.
> >>
> >> Signed-off-by: Peter Rajnoha <prajnoha@xxxxxxxxxx>
> >
> > I don't really understand what userspace can do with this. Why does it
> > matter what the sequence number was for a specific kobject? What can
> > happen because of this?
> >
> > Do you have any example libudev code changes to show this in action so I
> > can maybe understand the need?
> >
>
> Not a libudev code change for this, at least I don't have this in plan
> at this moment.
>
> It's more related to the way the udev rules are processed and, in
> general, the way any other uevent listener can react to uevents. So the
> change is in how "end consumer" of the uevent does its processing.
>
> At the moment, the sequence number is used globally, we don't have
> per-kobject sequence numbers. So we can't use current SEQNUM that is
> attached to each uevent to tell whether we have missed previous
> uevent(s) for concrete device or not, we just know that some uevents for
> some devices were not processed. Therefore, when we're triggering
> uevents from userspace (udevadm trigger), we need to reevaluate all the
> rules for all devices, no matter if it's really needed or not.
>
> For example, when we're switching from initramfs to root fs, we stop the
> first udevd instance which is in initramfs, we keep the database (at the
> moment, only for selected devices where we know that it's safe to do so
> - same rules used in initramfs as in root fs). Then we switch over to
> root fs and start the udevd again and we trigger uevents to fully
> refresh the state, taking also the existing database into account.
>
> Now, it would help if we knew whether there were any uevents issued
> after we stopped udevd in initramfs and before we started a new one from
> root fs. If there are no uevents in between, we don't need to reevaluate
> all the rules, we can just keep the existing database content. This can
> become prominent if we have more devices in the system. It helps to
> minimize the number of execution during boot, the overall impact of
> triggering uevents and related processing, hence also making it more
> straightforward to watch the state of the device and possibly debug
> related problems.
>
> The same principle applies to any uevent listener in userspace, not just
> udevd:
>
> - recording last seen SEQNUM
>
> - when new instance of the listener is running
>
> - trigger synthetic uevents to refresh state
>
> - comparing the PREV_SEQNUM from uevent generated with stored SEQNUM:

Why can't you just compare SEQNUM with the "last seen" SEQNUM, they
should just be 1 apart, right? Isn't that what is done today? How will
this differ from looking at PREV_SEQNUM?

> - If they differ, refresh the state fully, considering the state
> we know of as obsolete as there were already other uevents before we
> haven't processed. This may include deeper scans which may take longer,
> executing more helper tools to get the state of the device.

But that is what "last seen" also shows.

> - If the are same, reuse the state we already know about is
> up-to-date (there were no uevents we haven't seen), don't reevaluate
> everything.

Testing for +1 should be the same.

> With the new synthetic uevent interface (the extention to
> /sys/.../uevent) that accepts arguments (UUID and key=value pairs), we
> can make it possible to specify the nature of the triggered uevent, we
> can mark it through the key=value pairs which the uevent listener in
> userspace can check for, so if we can also:

but that doesn't matter for PREV_SEQNUM.

> - either refresh only the state for devices where we missed uevents
> (that could be used during bootup, only to refresh the state for devices
> where we have no current state yet), comparing current PREV_SEQNUM and
> stored last SEQNUM in userspace
>
> - or to force full refresh if that's needed for a reason.
>
> Also, some devices may have more complex activation scheme than usual
> "device addition" to the system. There can be a sequence of uevents
> which are expected for a successful change of device state, so we have a
> state machine that tracks uevents (as a confirmation from kernel for
> userspace that we can hook on in udev rules and uevent listeners). And
> here, it's very helpful to know that we have missed uevent(s) before or
> not, either they were not delivered or the uevent listener itself has
> been restarted and during this period, it may have missed uevents.
>
> In summary, it's adding more robustness to the uevent interface where we
> can directly and quickly detect we missed the uevent or not for a
> specific device and hence we can optimize udev rules/uevent listeners
> based on that so we don't dully refresh all the state and reexecute all
> rules and code for any triggered uevents. It's really not needed all the
> time.

I'd like to see real, working code that uses this new interface, showing
how it is not possible to do what is needed without the PREV_SEQNUM,
before considering adding this. Otherwise it's just stuff we have to
drag around for forever.

thanks,

greg k-h