Re: [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices

From: Toshi Kani
Date: Thu Jun 25 2015 - 17:52:21 EST


On Thu, 2015-06-25 at 14:31 -0700, Dan Williams wrote:
> On Thu, Jun 25, 2015 at 11:34 AM, Williams, Dan J
> <dan.j.williams@xxxxxxxxx> wrote:
> > On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
> >> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
> >> > From: Toshi Kani <toshi.kani@xxxxxx>
> >> >
> >> > ACPI NFIT table has System Physical Address Range Structure entries that
> >> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
> >> > set in the flags.
> >> >
> >> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
> >> > and set it to a new numa_node field of nd_region_desc, which is then
> >> > conveyed to the nd_region device.
> >> >
> >> > The device core arranges for btt and namespace devices to inherit their
> >> > node from their parent region.
> >> >
> >> > Signed-off-by: Toshi Kani <toshi.kani@xxxxxx>
> >> > [djbw: move set_dev_node() from region 'probe' to 'create']
> >>
> >> Sorry, I failed to mention other issue, which led me call set_dev_node()
> >> in probe. nd_async_device_register() calls device_add(), which does:
> >>
> >> /* use parent numa_node */
> >> if (parent)
> >> set_dev_node(dev, dev_to_node(parent));
> >>
> >> and overwrites numa_node to -1. Since region's parent is ndbusN, we
> >> cannot set numa_node to the parent. So, I had to set it in probe.
> >
> > In general, I still don't like leaving it up to ->probe() which is
> > within its rights to fail and not set the node. How about the following
> > that moves it to the bus uevent code? Should get triggered before probe
> > so the numa_node is valid before userspace is ever notified about the
> > device.
> >
> > device_add() does:
> >
> > kobject_uevent(&dev->kobj, KOBJ_ADD);
> > bus_probe_device(dev);
> >
> > ...so I think we're good, agree? I also added a missing init of
> > ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.
>
> This looks good in a quick manual test. It's interesting/illustrative
> that I inadvertently broke the one bit of the libnvdimm sysfs
> interface that did not have unit test coverage.

Sorry I had some interrupt. Yes, this works fine for region &
namespace. I'd like to check with you for btt since the attach logic
has changed in v2.

Previously, as described in patch 16/17, bttN bound to pmem had a valid
numa_node value, and seeding btt0 had -1.

/sys/bus/nd/devices
|-- btt0/numa_node:-1
|-- btt1/numa_node:0

In this version, there are unbound (seeding?) btt0-3 for every region
(there are 4 regions) and btt4 & 5 bound to pmem0 & 3 on my system.

btt0/numa_node:0
btt1/numa_node:0
btt2/numa_node:1
btt3/numa_node:1
btt4/numa_node:0
btt5/numa_node:1

btt0
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
btt1
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt1
btt2
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
btt3
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt3
btt4
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
btt5
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5

And unbound bttNs attach to different regions across a reboot.

btt0/numa_node:0
btt1/numa_node:1
btt2/numa_node:1
btt3/numa_node:0
btt4/numa_node:0
btt5/numa_node:1

btt0
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
btt1
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt1
btt2
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
btt3
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt3
btt4
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
btt5
-> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5

Is this how you'd expect btt to work in this version? (I have not
looked at the btt changes yet)

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/