Re: [PATCHv2 0/3] Enable no_cache flag to driver_data

From: Luis R. Rodriguez
Date: Wed Jun 07 2017 - 13:59:30 EST


On Tue, Jun 06, 2017 at 02:31:54PM -0500, Li, Yi wrote:
> > We use the cache upon suspend to cache the firmware so that upon resume a
> > request will use that cache, to avoid the file lookup on disk. Doing a test
> > with qemu suspend + resume is possible but that requires having access to
> > the qemu monitor interface and doing something like this to trigger a wakeup:
> >
> > echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl

BTW if it helps for testing:

https://github.com/mcgrof/kvm-boot

> I am studying at the firmware caching codes and have to say it's very
> complicated. :-( Here are some questions:

It is! For this reason I tried to document it, have you read this:

https://www.kernel.org/doc/html/latest/driver-api/firmware/firmware_cache.html

This is certainly missing the notes about how some of this is loosely re-used
for possible duplicate requests though, but I think I've mentioned this on
my review of your patches so far.

> 1. Since device_cache_fw_images invokes dev_cache_fw_image through
> dpm_for_each_dev, adding a debugfs driver to kick it can only cache firmware
> for those associated with devices which has PM enabled, which do not include
> the driver_data_test_device. Any suggestions?

Ah, consider adding PM to driver_data_test_device ?

May be worth updating the documentation bout this requirement too!

> 2. Look into dev_cache_fw_image function, devres_for_each_res will walk
> through the firmware have been loaded before (through assign_firmware_buf ->
> fw_add_devm_name) and add to the todo list, eventually it will create the
> fw_names list. So in the test driver, we need to load the firmware once
> before calling the kick?

Yes, makes sense, given its a firmware cache, it would only make sense to
cache firmware for requests already made.

An important note I made in the documentation which you did not mention
but should be important for your own sanity:

"The firmware devres entry is maintained throughout the lifetime of the device.
This means that even if you release_firmware() the firmware cache will still be
used on resume from suspend."

This means the cache will be tried even if the driver did use release_firmware()
after request_firmware().

> > static void device_cache_fw_images(void)
> > {
> > ...
> > fwc->state = FW_LOADER_START_CACHE;
> > dpm_for_each_dev(NULL, dev_cache_fw_image);
> > ...
> > }
> This only applies to the devices have PM enabled.

Makes sense!

>
> device_uncache_fw_images_delay() will be called for PM_POST_RESTORE, which
> means the hibernation restore got error. How about the successful restore
> case, calling driver will free its firmware_buf after loading?

As it is on my latest 20170605-driver-data [0] (but should be just as yours if you
used a recent branch of mine), fw_pm_notify() uses:

static int fw_pm_notify(struct notifier_block *notify_block,
unsigned long mode, void *unused)
{
switch (mode) {
...
case PM_POST_SUSPEND:
case PM_POST_HIBERNATION:
case PM_POST_RESTORE:
/*
* In case that system sleep failed and syscore_suspend is
* not called.
*/
mutex_lock(&fw_lock);
fw_cache.state = FW_LOADER_NO_CACHE;
mutex_unlock(&fw_lock);
enable_firmware();

device_uncache_fw_images_delay(10 * MSEC_PER_SEC);
break;
}
}

So it uses also PM_POST_RESTORE. I think that covers it all ? As it
is today we handle the firmware cache for all drivers, it should be
transparent to drivers.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20170605-driver-data

> > To me this last part smells like a possible source of issue (not sure) if we might
> > suspend/resume a lot in short period of time, this theory could be tested by toggling
> > on/of this debugfs interface I suggested while having requests of different types
> > blast in.
> Agree, it might create an issue if the system is get into restore_prepare
> again before the device_uncache_fw_images_delay clear the cache, why we need
> the 10 * MSEC_PER_SEC delay? In theory, fwc->name_lock should protect the
> case though.

One possible solution may be to cancel_delayed_work(&fw_cache.work)
or cancel_delayed_work_sync(&fw_cache.work) on suspend, and then
force it to run *right away* so we clear any pending old cache.

One performance issue with this is we are clearing some possible cache we are
about to re-create, so mod_delayed_work() seems more efficient -- provided we
have accounting notes to know no new firmware requests have trickled in since
the last cache build. This seems rather complicated so a first initial approach
to just kill all known cache before suspend seems easy and fair.

BTW such a fix might be a stable candidate if you find a real behavioural
issue with the kernel !

> > This is the sort of testing which would really help here.
> >
> > Likewise, if you are extending functionality please consider ways to break it :)
> Understand the need to test the firmware caching part. For non-caching test
> case, will it be enough if we can test that the noncache setting will ban
> the firmware name be added to the fwc->fw_names list?

Good question. Let's see... in the future a device might have two requests
with the same firmware, one with a cache enabled and one without it -- I suppose
for now we should perhaps disable a non-cache request if one already exists
with a cache request ? The code for that probably is not there... But other than
this corner case...

Yeah I think that's a reasonable test case. Such a code seems may need a debug
export symbol to allow drivers to query for this info, if so feel free to a
respective debug kconfig entry for it.

Luis