Re: [PATCH v2] of: overlay: log the error cause on resolver failure

From: Frank Rowand
Date: Thu Feb 27 2020 - 14:30:49 EST


On 2/27/20 2:11 AM, Luca Ceresoli wrote:
> Hi Frank,
>
> On 26/02/20 04:53, Frank Rowand wrote:
>> On 2/25/20 10:45 AM, Luca Ceresoli wrote:
>>> For some of its error paths, of_resolve_phandles() only logs a very generic
>>> error which does not help much in finding the origin of the problem:
>>>
>>> OF: resolver: overlay phandle fixup failed: -22
>>>
>>> Add error messages for all the error paths that don't have one. Now a
>>> specific message is always emitted, thus also remove the generic catch-all
>>> message emitted before returning.
>>>
>>> For example, in case a DT overlay has a fixup node that is not present in
>>> the base DT __symbols__, this error is now logged:
>>>
>>> OF: resolver: node gpio9 not found in base DT, fixup failed
>>>
>>> Signed-off-by: Luca Ceresoli <luca@xxxxxxxxxxxxxxxx>
>>> Cc: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
>>> ---
>>>
>>> I don't know in detail the meaning of the adjust_local_phandle_references()
>>> and update_usages_of_a_phandle_reference() error paths, thus I have put
>>> pretty generic messages. Any suggestion on better wording would be welcome.
>>
>> If you have not read the code to understand what the meaning of
>> the errors are, you should not be suggesting changes to the error
>> messages.
>>
>> Only one of the issues detected as errors can possibly be something
>> other than an error either in the resolver.c code or the dtc
>> compiler -- a missing symbol in the live devicetree. This may
>> be because of failing to compile the base devicetree without
>> symbols, depending on a symbol from another overlay where the
>> other overlay has not been applied, or depending on a symbol
>> from another overlay where the other overlay is applied but
>> the overlay was not compiled with symbols. (Not meant to be
>> an exhaustive list, but it might be.) Thus the missing
>> symbol problem might be fixable without a fix to kernel
>> code. The error message philosophy for overlay related
>> errors is to minimize error messages that help diagnose
>> the precise cause of a kernel code bug, with the intent
>> of keeping the code more compact and readable. When a
>> bug occurs, debugging messages can be added for the
>> debug session.
>
> Got it, sorry about that.
>
>> Following this philosophy, only the message in the second
>> patch chunk is ok.
>
> Then I think you can apply the v1 patch which only contains the message
> about the problem I experienced, and which was caused by an incorrect DTO:
>
> https://patchwork.ozlabs.org/patch/1243987/
>
> Just ignore the note saying the patch is not for mainline, it's wrong.
>

Mostly yes, v1 contains the one place a message should be added.

Let me bike shed a little bit though.

I suggested a different wording for the message in v2, but I
do not think my attempt at wording was precise enough. I
would instead suggest:

"node label '%s' not found in live devicetree symbols table\n"

Some subtle differences.
- It is a node label, not a node name.
- If multiple overlays are applied, then the intention may have
been to supply the node label via a previously applied overlay
instead of from the base devicetree. So specifying the live
devicetree is more accurate.

Please submit v3 for mainline.

Thanks,

Frank