Re: [PATCH v4 1/4] firmware: Move umh locking code into fw_load_from_user_helper()

From: Luis R. Rodriguez
Date: Fri Sep 09 2016 - 18:10:11 EST


On Fri, Sep 09, 2016 at 12:19:38PM +0800, Ming Lei wrote:
> On Fri, Sep 9, 2016 at 11:39 AM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> > On Sep 8, 2016 6:22 PM, "Ming Lei" <ming.lei@xxxxxxxxxxxxx> wrote:
> >>
> >> On Fri, Sep 9, 2016 at 4:11 AM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx>
> >> wrote:
> >> > On Thu, Sep 08, 2016 at 11:37:54PM +0800, Ming Lei wrote:
> >> >> On Wed, Sep 7, 2016 at 4:45 PM, Daniel Wagner <wagi@xxxxxxxxx> wrote:
> >> >> > From: Daniel Wagner <daniel.wagner@xxxxxxxxxxxx>
> >> >> >
> >> >> > When we load the firmware directly we don't need to take the umh
> >> >> > lock.
> >> >>
> >> >> I am wondering if it can be wrong.
> >> >
> >> > If you disable the firmware UMH why would we need to lock if the lock is
> >> > being
> >> > shown only used for the firmare UMH ?
> >> >
> >> >> Actually in case of firmware loading, the usermode helper lock doesn't
> >> >> only mean the user helper is usable, and it also may serve to mark the
> >> >> filesystem/block device is ready for firmware loading, and of couse
> >> >> direct
> >> >> loading need fs/block to be ready too.
> >> >
> >> > Yes but that's a race I've identified a while ago present even if you
> >> > use initramfs *and*
> >> > use kernel_read_file_from_path() on any part of the kernel [0], I
> >> > proposed a possible
> >>
> >> Actualy I mean the situation of suspend vs. resume, and some drivers
> >> still may not benefit from firmware loading cache when requesting loading
> >> in .resume(), at that time it is still too early for direct loading.
> >> With UMH lock,
> >> we can get a warning or avoid the issue.
> >
> > Agreed, but that would seem odd and perhaps misleading to have a try lock
> > for UMH when no firmware UMH code is enabled. This should probably made
> > clear in comments for now as to why we have it then and we should just mark
>
> That is very helpful, :-)

Upon a closer look at how we use usermodehelper_read_trylock() and
usermodehelper_read_lock_wait() for the non-CONFIG_FW_LOADER_USER_HELPER kernels,
it seems we have to change things a bit more if we wanted to keep this *and*
I've personally come to the determination we can stuff all this crap under
CONFIG_FW_LOADER_USER_HELPER due to the firmware cache. Here's why:

As Daniel has it, the usermodehelper_read_lock_wait() case this unfolds into a:

schedule_timeout(0) when the usermodehelper_disabled is true, since after
the freezer was introduced this can be any of these values:

enum umh_disable_depth {
UMH_ENABLED = 0,
UMH_FREEZING,
UMH_DISABLED,
};

It means we schedule_timeout(0) anytime usermodehelper_disabled is either
UMH_FREEZING or UMH_DISABLED. Contrary to usermodehelper_read_trylock() it
will also *not* call try_to_freeze().

Given how we setup timer for this:

expire = timeout + jiffies;

setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
__mod_timer(&timer, expire, false);
schedule();
del_singleshot_timer_sync(&timer);

We will naturally *always* timeout on schedule_timeout(0), so this return 0
always.

As Daniel has it when CONFIG_FW_LOADER_USER_HELPER is enabled we do have a
timeout and that just ensures we schedule_timeout(timeout) and as such can run
in the loop for usermodehelper_read_trylock() trying to see if the
usermodehelper_disabled has finally flipped to UMH_ENABLED, after which we'd
grant entry, say you can pass Go, and collect $200.

Since we use schedule_timeout(0) for !CONFIG_FW_LOADER_USER_HELPER it means
we *always* break out of the loop immediately, and return 0. If we are
in !CONFIG_FW_LOADER_USER_HELPER and the usermodehelper_disabled is set
to UMH_ENABLED we *immediately* bail and return 0 as well.

So effectively this code:

_request_firmware()
{
...
timeout = usermodehelper_read_lock_wait(timeout);
if (!timeout) {
dev_dbg(device, "firmware: %s loading timed out\n",
name);
ret = -EBUSY;
goto out;
}
...
}

Will always returns -EBUSY if timeout is 0.

Our current default is 60 even for !CONFIG_FW_LOADER_USER_HELPER kernels, in
that case what this code does is only on FW_OPT_NOWAIT calls
(request_firmware_nowait()) it will wait for 60 seconds for the freezer to
clear.

In Daniel's code right now this would change the default timeout to 0, but that
would effectively break all request_firmware_nowait() calls. If we want to
keep the same functionality we'd have to keep the default timeout set to 60
then for !CONFIG_FW_LOADER_USER_HELPER.

For the non-FW_OPT_NOWAIT (regular sync request_firmware() calls) cases we have:

_request_firmware()
{
...
ret = usermodehelper_read_trylock();
if (WARN_ON(ret)) {
dev_err(device, "firmware: %s will not be loaded\n",
name);
goto out;
}
...
}

This will do the same loop but if we are freezing (UMH_FREEZING,) it *will*
take the time to call try_to_freeze(). If we've hit UMH_DISABLED we bail
with -EAGAIN. Contrary to the async case this doesn't time out, so it
just loops forever and schedules()'s in hopes we eventually change state.

So.. this would have to be kept if we want to keep the same behaviour.

Now that this is all groked the next question is do we really need this
crap ? My interest in this is not just for this patch series -- the
"sysdata" flexible API (which I'll re-brand to "driverdata" for lack of
any better alternative recommendations) also did away with this usermode
lock crap completely as none of the above was documented, and that API was
to *avoid* the fw usermode helper at all costs.

If we *really* need something like this it should mean all users of
kernel_read_file_from_path() might also need this. Note though:

a) we *warn* about calls during resume callbacks -- we want to avoid them,
note this was added via a144c6a ("PM: Print a warning if firmware is
requested when tasks are frozen"). That warn is currently only applicable
on sync requests. Note that the UMH code and commits indicate that
we actually cannot put tasks to sleep that are not interruptible so
that race will always exist, as such its rather fatal to use these
calls on resume, but we don't warn there.

b) if we've entered UMH_DISABLED (this means freeze_processes() is done)
we kill all requests until thaw_processes() is called when we are
coming out of suspend.

*But*

c) commit 07646d9c0938d ("firmware loader: cache devices firmware during
suspend/resume cycle") added a clever firmware cache solution to
*enable* drivers to call request firmware on resume ! The trick
to this and why we currently *do not get warnings on resume* calls
is the warning is part of _request_firmware() but only for the sync
case where we use usermodehelper_read_trylock(), and note :

the cache is checked earlier via __fw_lookup_buf()!

Effectively the firmware cache does aways with the stupid issues
with calling request_firmware() on resume, its effective so long
as you don't use the firmware usermode helper given that we otherwise
kill these on our way at suspend anyway. The firmware cache *does not*
help if you have are using the firmware usermode helper then, and
as such I can only think of the usermodehelper_read_trylock() and
usermodehelper_read_lock_wait() as effective *iff* you *are* using
the firmware usermode helper given that we have a solution for
suspend/resume without it. Additionally if we needed any type of
additional suspend/resume check for the native filesystem loader
we should implement it properly using the fw_pm_notify and/or
the struct syscore_ops fw_syscore_ops.

The complexities with using the UMH and suspend/resume should send
chills to anyone still considering that solution.

The flexible driverdata API I'll add would just *skip* the UMH calls
completely, for now best we can do is just compartmentalize it completley
as much as possible as Daniel has been doing.

> > a TODO item to generalize this to a common freezer check. Surprised we don't
> > have one yet. Rafael ?

The above determination of using the proper struct syscore_ops IMHO would
be the appropriate solution.

Some other notes for Daniel:

o feel free to rename __fw_lookup_buf() and related calls so something easier to
understand that this is looking at the *firmware cache*

o If Ming agrees with the above findings I think you should proceed with
stuffing the UMH lock crap into the FW UMH code as you had intended

Luis