Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

From: Nishanth Aravamudan
Date: Mon Sep 28 2015 - 13:34:45 EST


On 27.09.2015 [23:59:08 +0530], Raghavendra K T wrote:
> Problem description:
> Powerpc has sparse node numbering, i.e. on a 4 node system nodes are
> numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid
> got from device tree is naturally mapped (directly) to nid.

chipid is a OPAL concept, I believe, and not documented in PAPR... How
does this work under PowerVM?

> Potential side effect of that is:
>
> 1) There are several places in kernel that assumes serial node numbering.
> and memory allocations assume that all the nodes from 0-(highest nid)
> exist inturn ending up allocating memory for the nodes that does not exist.
>
> 2) For virtualization use cases (such as qemu, libvirt, openstack), mapping
> sparse nid of the host system to contiguous nids of guest (numa affinity,
> placement) could be a challenge.
>
> Possible Solutions:
> 1) Handling the memory allocations is kernel case by case: Though in some
> cases it is easy to achieve, some cases may be intrusive/not trivial.
> at the end it does not handle side effect (2) above.
>
> 2) Map the sparse chipid got from device tree to a serial nid at kernel
> level (The idea proposed in this series).
> Pro: It is more natural to handle at kernel level than at lower (OPAL) layer.
> con: The chipid is in device tree no longer the same as nid in kernel

Is there any debugging/logging? Looks like not -- so how does a sysadmin
map from firmware-provided values to the Linux values? That's going to
make debugging of large systems (PowerVM or otherwise) less than
pleasant, it seems? Possibly you could put something in sysfs?

-Nish

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/