On Mon, Jul 26, 2010 at 11:28:18AM +0200, Andre Przywara wrote:If grep didn't fool me, then the only users in libnuma aware of that bug are the test implementations in numactl-2.0.3/test, namely /test/tshm.c (NUMA_MAX_NODES+1) and test/mbind_mig_pages.c (old_nodes->size + 1).When the mbind() syscall implementation processes the node mask
provided by the user, the last node is accidentally masked out.
This is present since the dawn of time (aka Before Git), I guess
nobody realized that because libnuma as the most prominent user of
mbind() uses large masks (sizeof(long)) and nobody cared if the
64th node is not handled properly. But if the user application
defers the masking to the kernel and provides the number of valid bits
in maxnodes, there is always the last node missing.
However this also affect the special case with maxnodes=0, the manpage
reads that mbind(ptr, len, MPOL_DEFAULT, &some_long, 0, 0); should
reset the policy to the default one, but in fact it returns EINVAL.
This patch just removes the decrease-by-one statement, I hope that
there is no workaround code in the wild that relies on the bogus
behavior.
Actually libnuma and likely most existing users rely on it.
That would probably be overkill, but if this behavior is now fixed, it should be documented (in the manpage and in the kernel code).
The only way to change it would be to add new system calls.