Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed

From: Pierre-Louis Bossart
Date: Tue Sep 13 2022 - 14:36:44 EST





>>>>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>>>>> index 6e569a875a9b..0bcc2d161eb9 100644
>>>>> --- a/drivers/soundwire/bus.c
>>>>> +++ b/drivers/soundwire/bus.c
>>>>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct
>>>>> sdw_bus *bus)
>>>>>        struct sdw_slave_id id;
>>>>>        struct sdw_msg msg;
>>>>>        bool found;
>>>>> -    int count = 0, ret;
>>>>> +    int count = 0, num_programmed = 0, ret;
>>>>>        u64 addr;
>>>>>          /* No Slave, so use raw xfer api */
>>>>>        ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>>>>                   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ,
>>>>> buf);
>>>>>        if (ret < 0)
>>>>> -        return ret;
>>>>> +        return 0;
>>>>
>>>> this doesn't seem quite right to me, there are multiple -EINVAL cases
>>>> handled in sdw_fill_msg().
>>>>
>>>> I didn't check if all these error cases are irrelevant in that specific
>>>> enumeration case, if that was the case maybe we need to break that
>>>> function in two helpers so that all the checks can be skipped.
>>>>
>>>
>>> I don't think that there's anything useful that
>>> sdw_modify_slave_status() could do to recover from an error.
>>>
>>> If any device IDs were programmed then, according to the statement in
>>> sdw_modify_slave_status()
>>>
>>>      * programming a device number will have side effects,
>>>      * so we deal with other devices at a later time
>>>
>>> if this is true, then we need to exit to deal with what _was_
>>> programmed, even if one of them failed.
>>>
>>> If nothing was programmed, and there was an error, we can't bail out of
>>> sdw_modify_slave_status(). We have status for other devices which
>>> we can't simply ignore.
>>>
>>> Ultimately I can't see how pushing the error code up is useful.
>>> sdw_modify_slave_status() can't really do any effective recovery action,
>>> and the original behavior of giving up and returning means that
>>> an error in programming dev ID potentially causes collateral damage to
>>> the status of other peripherals.
>>
>> I was suggesting something like
>>
>>
>> void sdw_fill_msg_data(...)
>> {
>>    copy data in the msg structure
>> }
>>
>> int sdw_fill_msg(...)
>> {
>>      sdw_fill_msg_data();
>>      handle_error_cases
>> }
>>
>> and in sdw sdw_program_device_num() we call directly sdw_fill_msg_data()
>>
>> So no change in functionality beyond explicit skip of error checks that
>> are not relevant and cannot be handled even if they were.
>>
>
> sdw_fill_msg() will never report an error during
> sdw_program_device_num() because the first check is to return if
> the address doesn't need paging, and sdw_program_device_num() only
> accesses SCP registers.
>
> I don't want to mix coding improvements with bugfixes. Splitting
> sdw_fill_msg() isn't needed to fix this bug.

It's not required but it helps remove a useless always-false condition.
We have way too many error cases in the bus code, most of which have
never been tested. Agree it can be done later, it's just that reviewing
this code changes exposes things that were not noticed before.