Re: [PATCH] arm64: configurable sparsemem section size

From: Dan Williams
Date: Wed Apr 24 2019 - 16:24:31 EST


On Wed, Apr 24, 2019 at 12:54 PM Pavel Tatashin
<pasha.tatashin@xxxxxxxxxx> wrote:
>
> <resending> from original email
>
> On Wed, Apr 24, 2019 at 3:48 PM Pavel Tatashin
> <patatash@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, Apr 24, 2019 at 5:07 AM Anshuman Khandual
> > <anshuman.khandual@xxxxxxx> wrote:
> > >
> > > On 04/24/2019 02:08 AM, Pavel Tatashin wrote:
> > > > sparsemem section size determines the maximum size and alignment that
> > > > is allowed to offline/online memory block. The bigger the size the less
> > > > the clutter in /sys/devices/system/memory/*. On the other hand, however,
> > > > there is less flexability in what granules of memory can be added and
> > > > removed.
> > >
> > > Is there any scenario where less than a 1GB needs to be added on arm64 ?
> >
> > Yes, DAX hotplug loses 1G of memory without allowing smaller sections.
> > Machines on which we are going to be using this functionality have 8G
> > of System RAM, therefore losing 1G is a big problem.
> >
> > For details about using scenario see this cover letter:
> > https://lore.kernel.org/lkml/20190421014429.31206-1-pasha.tatashin@xxxxxxxxxx/
> >
> > >
> > > >
> > > > Recently, it was enabled in Linux to hotadd persistent memory that
> > > > can be either real NV device, or reserved from regular System RAM
> > > > and has identity of devdax.
> > >
> > > devdax (even ZONE_DEVICE) support has not been enabled on arm64 yet.
> >
> > Correct, I use your patches to enable ZONE_DEVICE, and thus devdax on ARM64:
> > https://lore.kernel.org/lkml/1554265806-11501-1-git-send-email-anshuman.khandual@xxxxxxx/
> >
> > >
> > > >
> > > > The problem is that because ARM64's section size is 1G, and devdax must
> > > > have 2M label section, the first 1G is always missed when device is
> > > > attached, because it is not 1G aligned.
> > >
> > > devdax has to be 2M aligned ? Does Linux enforce that right now ?
> >
> > Unfortunately, there is no way around this. Part of the memory can be
> > reserved as persistent memory via device tree.
> > memory@40000000 {
> > device_type = "memory";
> > reg = < 0x00000000 0x40000000
> > 0x00000002 0x00000000 >;
> > };
> >
> > pmem@1c0000000 {
> > compatible = "pmem-region";
> > reg = <0x00000001 0xc0000000
> > 0x00000000 0x80000000>;
> > volatile;
> > numa-node-id = <0>;
> > };
> >
> > So, while pmem is section aligned, as it should be, the dax device is
> > going to be pmem start address + label size, which is 2M. The actual
> > DAX device starts at:
> > 0x1c0000000 + 2M.
> >
> > Because section size is 1G, the hotplug will able to add only memory
> > starting from
> > 0x1c0000000 + 1G

This is yet another example of where we need to break down the section
alignment requirement for arch_add_memory().

https://lore.kernel.org/lkml/155552633539.2015392.2477781120122237934.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/