Re: [PATCH driver-core/master] firmware: Correct handling of fw_state_wait_timeout() return value

From: Jakub Kicinski
Date: Mon Jan 16 2017 - 14:13:40 EST


On Mon, Jan 16, 2017 at 10:29 AM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> On Mon, Jan 16, 2017 at 02:57:06PM +0000, Jakub Kicinski wrote:
>> Commit 5d47ec02c37e ("firmware: Correct handling of fw_state_wait()
>> return value") made the assumption that any error returned from
>> fw_state_wait_timeout() means FW load has to be aborted. This is
>> incorrect FW load only has to be aborted when load timed out or
>
> You want a comma before FW -- but also:

Thanks!

>> has been interrupted,
>
> __fw_state_wait_common() returns -ENOENT when:
>
> if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> return -ENOENT;
>
> Why not for when -ENOENT is returned ?

I'm just going back to the pre-5d47ec02c37e behavior, I don't get all
the details of this code. My understanding is that pre-5d47ec02c37e
we were only aborting on ret == 0 (i.e. timeout) or -ERESTARTSYS.

>> otherwise the waking thread had already
>> cleaned up for us.
>
> What code in what waking thread would have done precisely what cleanup?

That is not clear to me. The waking is done in
firmware_loading_store(). I don't follow why firmware_loading_store()
is using fw_load_abort() in -1 case and fw_state_aborted() on an error
path of the 0 case (it's pre-git era stuff). I assume the
fw_load_abort() unlinks the buffer so that next calls to store will
error out in the check on line 716. I was initially going to change
that fw_load_abort() to *_aborted() but I'm afraid of the slight
change in user-visible behavior.

> And why can't fw_load_abort() handle being called twice and why not just
> instead allow for that?

Personal preference of making sure code is correct and not just able
to handle errors, I guess.

>> Fixes: 5d47ec02c37e ("firmware: Correct handling of fw_state_wait() return value")
>
> What does this fix exactly? A fix should describe the impact, what
> issues are in place without the fix. What also happens after the fix
> and why. In this commit log none of this is clear.

Sorry :S The bug report was here:
http://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1310204.html
I should've done a better job, the tl;dr is that calling *_abort()
again in case user helper wrote -1 (FW not found) is causing a
NULL-deref.

>> Signed-off-by: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
>> ---
>> drivers/base/firmware_class.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
>> index 4497d263209f..ce142e6b2c72 100644
>> --- a/drivers/base/firmware_class.c
>> +++ b/drivers/base/firmware_class.c
>> @@ -1020,7 +1020,7 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
>> }
>>
>> retval = fw_state_wait_timeout(&buf->fw_st, timeout);
>> - if (retval < 0) {
>> + if (retval == -ETIMEDOUT || retval == -ERESTARTSYS) {
>
> Also, if your change is correct I will also note fw_state_wait_timeout()
> is just a wrapper for __fw_state_wait_common(), but we also have
> another wrapper for __fw_state_wait_common() now:
>
> #define fw_state_wait(fw_st) \
> __fw_state_wait_common(fw_st, MAX_SCHEDULE_TIMEOUT)
>
> Do we need to fix anything for fw_state_wait() ?

I looked at it and I think it's fine.

> Clarifying all this would help review your proposed changes. If you
> consider them a fix please be very clear as to the exact issue and
> what is fixed with your patch.

Sorry again, I hope things are clearer now.