Re: [PATCH v6] ARM: boot: Obtain start of physical memory from DTB

From: Russell King - ARM Linux admin
Date: Tue May 19 2020 - 07:44:08 EST


On Tue, May 19, 2020 at 01:21:09PM +0200, Geert Uytterhoeven wrote:
> Hi Russell,
>
> CC devicetree
>
> On Tue, May 19, 2020 at 11:46 AM Russell King - ARM Linux admin
> <linux@xxxxxxxxxxxxxxx> wrote:
> > On Tue, May 19, 2020 at 11:44:17AM +0200, Geert Uytterhoeven wrote:
> > > On Tue, May 19, 2020 at 10:54 AM Lukasz Stelmach <l.stelmach@xxxxxxxxxxx> wrote:
> > > > It was <2020-04-29 Åro 10:21>, when Geert Uytterhoeven wrote:
> > > > > Currently, the start address of physical memory is obtained by masking
> > > > > the program counter with a fixed mask of 0xf8000000. This mask value
> > > > > was chosen as a balance between the requirements of different platforms.
> > > > > However, this does require that the start address of physical memory is
> > > > > a multiple of 128 MiB, precluding booting Linux on platforms where this
> > > > > requirement is not fulfilled.
> > > > >
> > > > > Fix this limitation by obtaining the start address from the DTB instead,
> > > > > if available (either explicitly passed, or appended to the kernel).
> > > > > Fall back to the traditional method when needed.
> > > > >
> > > > > This allows to boot Linux on r7s9210/rza2mevb using the 64 MiB of SDRAM
> > > > > on the RZA2MEVB sub board, which is located at 0x0C000000 (CS3 space),
> > > > > i.e. not at a multiple of 128 MiB.
> > > > >
> > > > > Suggested-by: Nicolas Pitre <nico@xxxxxxxxxxx>
> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx>
> > > > > Reviewed-by: Nicolas Pitre <nico@xxxxxxxxxxx>
> > > > > Reviewed-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > > > > Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> > > > > Tested-by: Dmitry Osipenko <digetx@xxxxxxxxx>
> > > > > ---
> > > >
> > > > [...]
> > > >
> > > > Apparently reading physical memory layout from DTB breaks crashdump
> > > > kernels. A crashdump kernel is loaded into a region of memory, that is
> > > > reserved in the original (i.e. to be crashed) kernel. The reserved
> > > > region is large enough for the crashdump kernel to run completely inside
> > > > it and don't modify anything outside it, just read and dump the remains
> > > > of the crashed kernel. Using the information from DTB makes the
> > > > decompressor place the kernel outside of the dedicated region.
> > > >
> > > > The log below shows that a zImage and DTB are loaded at 0x18eb8000 and
> > > > 0x193f6000 (physical). The kernel is expected to run at 0x18008000, but
> > > > it is decompressed to 0x00008000 (see r4 reported before jumping from
> > > > within __enter_kernel). If I were to suggest something, there need to be
> > > > one more bit of information passed in the DTB telling the decompressor
> > > > to use the old masking technique to determain kernel address. It would
> > > > be set in the DTB loaded along with the crashdump kernel.
> > >
> > > Shouldn't the DTB passed to the crashkernel describe which region of
> > > memory is to be used instead?
> >
> > Definitely not. The crashkernel needs to know where the RAM in the
> > machine is, so that it can create a coredump of the crashed kernel.
>
> So the DTB should describe both ;-)
>
> > > Describing "to use the old masking technique" sounds a bit hackish to me.
> > > I guess it cannot just restrict the /memory node to the reserved region,
> > > as the crashkernel needs to be able to dump the remains of the crashed
> > > kernel, which lie outside this region.
> >
> > Correct.
> >
> > > However, something under /chosen should work.
> >
> > Yet another sticky plaster...
>
> IMHO the old masking technique is the hacky solution covered by
> plasters.

One line of code is not "covered by plasters". There are no plasters.
It's a solution that works for 99.99% of people, unlike your approach
that has had a stream of issues over the last four months, and has
required many reworks of the code to fix each one. That in itself
speaks volumes about the suitability of the approach.

> DT describes the hardware.

Right, so DT is correct.

> In general, where to put the kernel is a
> software policy, and thus doesn't belong in DT, except perhaps under
> /chosen. But that would open another can of worms, as people usually
> have no business in specifying where the kernel should be located.
> In the crashkernel case, there is a clear separation between memory to
> be used by the crashkernel, and memory to be solely inspected by the
> crashkernel.
>
> Devicetree Specification, Release v0.3, Section 3.4 "/memory node" says:
>
> "The client program may access memory not covered by any memory
> reservations (see section 5.3)"
>
> (Section 5.3 "Memory Reservation Block" only talks about structures in
> the FDT, not about DTS)
>
> Hence according to the above, the crashkernel is rightfully allowed to
> do whatever it wants with all memory under the /memory node.
> However, there is also
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt.
> This suggests the crashkernel should be passed a DTB that contains a
> /reserved-memory node, describing which memory cannot be used freely.
> Then the decompressor needs to take this into account when deciding
> where the put the kernel.

So you want to throw yet more complexity at this solution to try and
make it work...

> Yes, the above requires changing code. But at least it provides a
> path forward, getting rid of the fragile old masking technique.

It's hardly fragile when it's worked fine for the last 20+ years,
whereas your solution can't work without some regression being reported
within a couple of weeks of it being applied. Again, that speaks
volumes about which one is better than the other.

Continually patching this approach to workaround one issue after another
shows that it is _this_ solution that is the fragile implementation.

A fragile implementation is by definition one that keeps breaking.
That's yours. The masking approach hasn't "broken" for anyone, and
hasn't been the cause of one single regression anywhere. Yes, there
are some platforms that it doesn't work for (because they choose to
reserve the first chunk of RAM for something) but that is not a
regression.

So, I'm not going to apply the next revision of this patch for at least
one whole kernel cycle - that means scheduling it for 5.10-rc at the
earliest.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up