Re: (EXT) Re: (EXT) Re: [PATCH] of: skip disabled CPU nodes

From: Matthias Schiffer
Date: Thu Aug 27 2020 - 03:10:55 EST


On Wed, 2020-08-26 at 13:26 -0600, Rob Herring wrote:
> On Wed, Aug 26, 2020 at 8:47 AM Frank Rowand <frowand.list@xxxxxxxxx>
> wrote:
> >
> > Hi Rob,
> >
> > On 2020-08-26 08:54, Matthias Schiffer wrote:
> > > On Wed, 2020-08-26 at 08:01 -0500, Frank Rowand wrote:
> > > > On 2020-08-26 07:02, Matthias Schiffer wrote:
> > > > > Allow disabling CPU nodes using status = "disabled".
> > > > >
> > > > > This allows a bootloader to change the number of available
> > > > > CPUs
> > > > > (for
> > > > > example when a common DTS is used for SoC variants with
> > > > > different
> > > > > numbers
> > > > > of cores) without deleting the nodes altogether (which may
> > > > > require
> > > > > additional fixups where the CPU nodes are referenced, e.g. a
> > > > > cooling
> > > > > map).
> > > > >
> > > > > Signed-off-by: Matthias Schiffer <
> > > > > matthias.schiffer@xxxxxxxxxxxxxxx
> > > > > >
> > > > >
> > > > > ---
> > > > > drivers/of/base.c | 2 ++
> > > > > 1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > > > index ea44fea99813..d547e9deced1 100644
> > > > > --- a/drivers/of/base.c
> > > > > +++ b/drivers/of/base.c
> > > > > @@ -796,6 +796,8 @@ struct device_node
> > > > > *of_get_next_cpu_node(struct
> > > > > device_node *prev)
> > > > > of_node_put(node);
> > > > > }
> > > > > for (; next; next = next->sibling) {
> > > > > + if (!__of_device_is_available(next))
> > > > > + continue;
> > > > > if (!(of_node_name_eq(next, "cpu") ||
> > > > > __of_node_is_type(next, "cpu")))
> > > > > continue;
> > > > >
> > > >
> > > > The original implementation of of_get_next_cpu_node() had
> > > > that check, but status disabled for cpu nodes has different
> > > > semantics than other nodes, and the check broke some systems.
> > > > The check was removed by c961cb3be906 "of: Fix cpu node
> > > > iterator to not ignore disabled cpu nodes".
> > > >
> > > > It would be useful to document that difference in the
> > > > header comment of of_get_next_cpu_node().
> > > >
> > > > -Frank
> > >
> > > Hmm, I see. This difference in behaviour is quite unfortunate, as
> > > I'm
> > > currently looking for a way to *really* disable a CPU core.
> > >
> > > In arch/arm64/boot/dts/freescale/imx8mn.dtsi (and other variants
> > > of the
> > > i.MX8M), there are 4 CPU nodes for the full-featured quad-core
> > > version.
> > > The reduced single- and dual-core versions are currently handled
> > > in
> > > NXP's U-Boot fork by deleting the additional nodes.
> > >
> > > Not doing so causes the kernel to hang for a while when trying to
> > > online the non-existent cores during boot (at least in linux-imx
> > > 5.4 -
> > > I have not checked a more recent mainline kernel yet), but the
> > > deletion
> > > is non-trivial to do without leaving dangling phandle references.
> >
> > Any thoughts on implementing another universal property that means
> > something like "the hardware described by this node does not exist
> > or is so broken that you better not use it".
>
> There's a couple of options:
>
> The DT spec defines 'fail' value for status. We could use that
> instead
> of 'disabled'.
>
> The spec behavior with cpu 'disabled' is only on PPC AFAIK. On
> arm/arm64 (probably riscv now too) we've never followed it where we
> online 'disabled' CPUs. So we could just make the check conditional
> on
> !IS_ENABLED(CONFIG_PPC). This would need some spec update.

On ARM(64), the "disabled" status on CPUs doesn't have any effect. I
assume this changed with the mentioned commit c961cb3be906 "of: Fix cpu
node iterator to not ignore disabled cpu nodes", as reverting it gives
me the desired behaviour of considering the disabled CPUs non-existent.

So it seems that we already changed the interpretation in a non-
compatible way once (back in v4.20), and somehow PPC has yet another
different behaviour?

How do we get out of this mess? Is going back to the v4.19 logic for
non-PPC platforms an acceptable regression fix, or would this be
considered another breaking change?



>
> > Matthias, if Rob thinks that is a good idea, then you should start
> > with a new proposal that is also sent to
> > devicetree-spec@xxxxxxxxxxxxxxx <devicetree-spec@xxxxxxxxxxxxxxx>
> >
> > -Frank
> >
> > >
> > > Kind regards,
> > > Matthias
> > >