Re: [PATCH v2] of: overlay: log the error cause on resolver failure

From: Frank Rowand
Date: Tue Feb 25 2020 - 22:53:50 EST


On 2/25/20 10:45 AM, Luca Ceresoli wrote:
> For some of its error paths, of_resolve_phandles() only logs a very generic
> error which does not help much in finding the origin of the problem:
>
> OF: resolver: overlay phandle fixup failed: -22
>
> Add error messages for all the error paths that don't have one. Now a
> specific message is always emitted, thus also remove the generic catch-all
> message emitted before returning.
>
> For example, in case a DT overlay has a fixup node that is not present in
> the base DT __symbols__, this error is now logged:
>
> OF: resolver: node gpio9 not found in base DT, fixup failed
>
> Signed-off-by: Luca Ceresoli <luca@xxxxxxxxxxxxxxxx>
> Cc: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
> ---
>
> I don't know in detail the meaning of the adjust_local_phandle_references()
> and update_usages_of_a_phandle_reference() error paths, thus I have put
> pretty generic messages. Any suggestion on better wording would be welcome.

If you have not read the code to understand what the meaning of
the errors are, you should not be suggesting changes to the error
messages.

Only one of the issues detected as errors can possibly be something
other than an error either in the resolver.c code or the dtc
compiler -- a missing symbol in the live devicetree. This may
be because of failing to compile the base devicetree without
symbols, depending on a symbol from another overlay where the
other overlay has not been applied, or depending on a symbol
from another overlay where the other overlay is applied but
the overlay was not compiled with symbols. (Not meant to be
an exhaustive list, but it might be.) Thus the missing
symbol problem might be fixable without a fix to kernel
code. The error message philosophy for overlay related
errors is to minimize error messages that help diagnose
the precise cause of a kernel code bug, with the intent
of keeping the code more compact and readable. When a
bug occurs, debugging messages can be added for the
debug session.

Following this philosophy, only the message in the second
patch chunk is ok. I will include an example of more
precise error messages in the other locations, just for
education on what is going wrong at those points.


>
> Changed in v2:
>
> - add a message for each error path that does not have one yet
> ---
> drivers/of/resolver.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/of/resolver.c b/drivers/of/resolver.c
> index 83c766233181..a80d673621bc 100644
> --- a/drivers/of/resolver.c
> +++ b/drivers/of/resolver.c
> @@ -291,8 +291,10 @@ int of_resolve_phandles(struct device_node *overlay)
> break;
>
> err = adjust_local_phandle_references(local_fixups, overlay, phandle_delta);
> - if (err)
> + if (err) {
> + pr_err("cannot adjust local phandle references\n");
> goto out;
> + }

Delete this message. But if there was a message, it could be:

"invalid overlay, adjust local phandle references failed\n"


>
> overlay_fixups = NULL;
>
> @@ -321,11 +323,15 @@ int of_resolve_phandles(struct device_node *overlay)
>
> err = of_property_read_string(tree_symbols,
> prop->name, &refpath);
> - if (err)
> + if (err) {
> + pr_err("node %s not found in base DT, fixup failed\n",
> + prop->name);

"symbol '%s' not found in live devicetree symbols table\n",
prop->name


> goto out;
> + }
>
> refnode = of_find_node_by_path(refpath);
> if (!refnode) {
> + pr_err("cannot find node for %s\n", refpath);
> err = -ENOENT;
> goto out;
> }
> @@ -334,13 +340,14 @@ int of_resolve_phandles(struct device_node *overlay)
> of_node_put(refnode);
>
> err = update_usages_of_a_phandle_reference(overlay, prop, phandle);
> - if (err)
> + if (err) {
> + pr_err("cannot update usages of a phandle reference (%s)\n",
> + prop->name);
> break;
> + }

Delete this message. But if there was a message, it could be:

"invalid fixup for symbol '%s'\n", prop->name


> }
>
> out:
> - if (err)
> - pr_err("overlay phandle fixup failed: %d\n", err);

Do not remove this message. The other messages do not explain that phandle fixup
failed - they provide a more detailed description of a specific reason _why_ the
phandle fixup failed.


> of_node_put(tree_symbols);
>
> return err;
>