Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

From: Luis R. Rodriguez
Date: Wed May 24 2017 - 18:39:00 EST


On Wed, May 24, 2017 at 3:00 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> On Wed, May 24, 2017 at 2:40 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
>> From: Martin Fuzzey <mfuzzey@xxxxxxxxxxx>
>>
>> Commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
>> is interrupted") added via 4.0 added support to abort the fallback mechanism
>> when a signal was detected and wait_for_completion_interruptible() returned
>> -ERESTARTSYS. Although the abort was effective we were unfortunately never
>> really propagating this error though and as such userspace could not know
>> why the abort happened.
>
> Can you give a simple example of what's going on and why it matters?
>
> ERESTARTSYS and friends are highly magical, and I'm not convinced that
> allowing _request_firmware_load to return -ERESTARTSYS is actually a
> good idea. What if there are system calls that can't handle this
> style of restart that start being restarted as a result?

This seems to be a linux-api question, so Cc'ing them and Michael.

For those not familiar it is worth explaining first the user interface.

This describes the fallback mechanism of the Linux firmware API if
direct filesystem lookup fails. Although most distros disable this
stuff today some distros (Android) seem to be always relying on it
today. Other than Android, since most distros have the forced fallback
mechanism off but *enable* requests to require it, only 2 upstream
drivers do this, so on other distros we only have 2 drivers which
typically require the the fallback mechanism.

The exposed user interface here is we enable a knob through sysfs to
enable userspace to write a file as a fallback mechanism then, and we
expect to get woken up today when userspace finds the needed file and
then writes the file to the sysfs knob. We wait for userspace using
swait_event_interruptible_timeout(). While we wait we can get a
-ERESTARTSYS since swait_event_interruptible_timeout() uses
__swait_event_interruptible_timeout() under the hood and this in turn
___swait_event() which can prepare_to_swait_event() which can return
-ERESTARTSYS on signal_pending_state().

The issue discovered was that Android could issue SIGCHLD and the
waiter gets a signal but the reason for the exact reason for the
failure is not propagated. The proposed patch propagates -ERESTARTSYS
when that is returned on signal_pending_state() as we wait.

So linux-api folks please speak up.

Luis