numa mm/mempolicy.c maxnode off by one

From: Paul Jackson
Date: Sun Jul 18 2004 - 20:21:09 EST


Andi,

As best as I can tell from code reading, the value of maxnode that is
passed in on the numa system calls mbind, get_mempolicy and
set_mempolicy is "off by one". This seems to be undocumented, and so
far as I have yet been able to imagine, unjustified.

The kernel code in mm/mempolicy.c expects a value one larger than is
natural, and then decrements it, or it uses "maxnode-1" (sometimes one
way, sometimes the other, inconsistently). Your libnuma user library
hides this off by one with a corresponding increment by one of maxnode.
But we are not all using libnuma. Some of us are using the actual
system calls, for our own nefarious purposes.

For example, on my 256 node system, with 256 bit user nodemasks, I have
to pass in a maxnode value of 257. And the libnuma library, and numactl
command utility based on it, which are hardcoded for 2048 bit nodemasks,
always pass in a maxnode value of 2049.

Someday, someone is going to pass in some nice power of two value,
corresponding to the actual number of nodes in their nodemasks, not
expecting that their last node is being ignored.

Andi - could you explain how this came to be, and perhaps recommend a
way to fix it (or explain why it's not broken and shouldn't be fixed)?

I could propose ways to fix this without breaking any libnuma that
you've already shipped. But first I should find out if my understanding
of the situation is correct.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/