Re: [RFC] firmware load: defer request_firmware during early boot and resume

From: Ming Lei
Date: Sat Jul 21 2012 - 15:56:24 EST


Thanks for your so detailed comments.

On Sun, Jul 22, 2012 at 1:31 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Jul 20, 2012 at 5:33 AM, Ming Lei <tom.leiming@xxxxxxxxx> wrote:
>> The RFC patch is just for discussing if the idea of deferring
>> request_firmware is doable for addressing the issue of
>> request_firmware in resume path, which is caused by driver
>> unbind/rebind during resume.
>
> NAK.

Suppose it is not good for resume case, I think it still makes sense
for early boot
situation, at least the patch will support to request firmware inside
init call, and
allow drivers to be built in kernel in case of requesting firmware from probe().

>
> It really *has* to be handled some other way.
>
> This whole approach seems to actively try to *silently* fix up broken
> drivers. And it's wrong.
>
> There's a reason we warn, and there's a reason we *have* to warn: to
> let driver writers know that what they are attempting to do MAY NOT
> WORK.
>
> Really.
>
> Sure, for a lot of devices it's fine to load the firmware later. But
> some devices may be part of the resume sequence in very critical ways,
> and deferring the firmware loading will just mean that the resume will
> fail.
>
> This we *need* the WARN_ON() - so that even in the case where it
> happens to work, people are very much told that "sure, your
> suspend/resume may have worked, but it was doing fundamentally wrong
> things that may mean that for other people it *won't* work".
>
> For example, maybe it's a USB network dongle, and for *YOU* it is
> perfectly fine to defer the firmware loading, so that the network
> comes back up a few seconds after the system is up and running.
>
> But in another machine, that exact same USB network dongle may
> actually be hardwired on the motherboard (it's fairly common to use
> USB as a "system bus" in some laptop and embedded devices), and maybe
> that other machine actually is a thin client that has some tiny rdinit
> thing, and then everything else is NFS-mounted, and if you resume
> without networking, the machine is simply *dead*.
>
> Ok, so that was a completely made-up example, but we have actually had
> situations kind of like that, where a device is just not that
> important for lots of people, but in other situations it's critical
> for the rest of the suspend/resume to succeed.
>
> This is why I'm so vehemently against silently "hiding" these issues.

OK, I see your concerns.

>
> If you have a driver that has problems, make THAT ONE PARTICULAR
> driver do the deferral explicitly. Don't make some generic "silently
> defer if there are problems" patch.

It is a good idea to let the driver defer request explicitly, but still need
some changes in generic code to support it.

>
> See what I'm saying? You're solving things in exactly the wrong place,
> and in exactly the wrong way. You're papering things over, and making
> the generic code silently just make broken cases work. That's really
> really bad, because it makes it *easier* for driver writers to do the
> wrong thing without even thinking about it, and without ever seeing
> the problem. And then when people say "suspend/resume doesn't work",
> the driver author says "it works for me" and ignores the problem.
> Because you've systemically made it easy to ignore the problem, and
> made it easy to do the wrong thing by default.
>
> So we should make driver writers do the right thing by default, and if
> they cannot do the right thing (and the "isight" camera always comes
> up, and f*ck it, just fix that driver) then they should do extra work.

So we can let isight driver to defer its request_firmware explicitly.

>
> Seriously. People should load their firmware *before* the
> suspend/resume cycle. And if that isn't possible, then the system
> should ABSOLUTELY NOT silently say "whatever" and defer it until
> later. We should have that big failure and the big noisy warning, and
> drivers that need to defer need to do so themselves, so that we never
> *ever* have that silent automatic defer situation.

In my opinion, we should cache firmware data for all hotplug
devices or devices which may experience power loss automatically
in kernel during suspend-resume cycle because all such devices may be
disconnected and connected again during suspend-resume cycle.

Looks it is not difficult to cache firmware data by kernel, for example, just
call the

cache_firmware(fw_name)

for each device which need firmware before suspending,
then call the below to uncache firmware after resume:

uncache_firmware(fw_name)

The problem is that many firmwares may consume much
memory, so we still may let drivers to choose if they need to
let kernel cache firmware automatically or just defer the
request_firmware in resume path by unbind & rebind driver to save
memory space, suppose the device is not important wrt. system
resume. Maybe just a few devices can't be allowed to defer
requesting firmware.

So saving memory space is another advantage of the deferral
of request_firmware.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/