Re: [PATCH] Fix off-by-one bug in mbind() syscall implementation

From: Andre Przywara
Date: Mon Jul 26 2010 - 06:24:44 EST


Andi Kleen wrote:
On Mon, Jul 26, 2010 at 11:28:18AM +0200, Andre Przywara wrote:
When the mbind() syscall implementation processes the node mask
provided by the user, the last node is accidentally masked out.
This is present since the dawn of time (aka Before Git), I guess
nobody realized that because libnuma as the most prominent user of
mbind() uses large masks (sizeof(long)) and nobody cared if the
64th node is not handled properly. But if the user application
defers the masking to the kernel and provides the number of valid bits
in maxnodes, there is always the last node missing.
However this also affect the special case with maxnodes=0, the manpage
reads that mbind(ptr, len, MPOL_DEFAULT, &some_long, 0, 0); should
reset the policy to the default one, but in fact it returns EINVAL.
This patch just removes the decrease-by-one statement, I hope that
there is no workaround code in the wild that relies on the bogus
behavior.

Actually libnuma and likely most existing users rely on it.
If grep didn't fool me, then the only users in libnuma aware of that bug are the test implementations in numactl-2.0.3/test, namely /test/tshm.c (NUMA_MAX_NODES+1) and test/mbind_mig_pages.c (old_nodes->size + 1).

Has this bug been known before?

The only way to change it would be to add new system calls.
That would probably be overkill, but if this behavior is now fixed, it should be documented (in the manpage and in the kernel code).
Also the actual libnuma code should be adjusted, then.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/