Re: [BISECTED] Linux 3.12.7 introduces page map handling regression

From: Linus Torvalds
Date: Tue Jan 21 2014 - 21:47:27 EST


On Tue, Jan 21, 2014 at 5:49 PM, Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Odds are this also shows up in 3.13, right?

Probably. I don't have a Xen PV setup to test with (and very little
interest in setting one up).. And I have a suspicion that it might not
be so much about Xen PV, as perhaps about the kind of hardware.

I suspect the issue has something to do with the magic _PAGE_NUMA
tie-in with _PAGE_PRESENT. And then mprotect(PROT_NONE) ends up
removing the _PAGE_PRESENT bit, and now the crazy numa code is
confused.

The whole _PAGE_NUMA thing is a f*cking horrible hack, and shares the
bit with _PAGE_PROTNONE, which is why it then has that tie-in to
_PAGE_PRESENT.

Adding Andrea to the Cc, because he's the author of that horridness.
Putting Steven's test-case here as an attachement for Andrea, maybe
that makes him go "Ahh, yes, silly case".

Also added Kirill, because he was involved the last _PAGE_NUMA debacle.

Andrea, you can find the thread on lkml, but it boils down to commit
1667918b6483 (backported to 3.12.7 as 3d792d616ba4) breaking the
attached test-case (but apparently only under Xen PV). There it
apparently causes a "BUG: Bad page map .." error.

And I suspect this is another of those "this bug is only visible on
real numa machines, because _PAGE_NUMA isn't actually ever set
otherwise". That has pretty much guaranteed that it gets basically
zero testing, which is not a great idea when coupled with that subtle
sharing of the _PAGE_PROTNONE bit..

It may be that the whole "Xen PV" thing is a red herring, and that
Steven only sees it on that one machine because the one he runs as a
PV guest under is a real NUMA machine, and all the other machines he
has tried it on haven't been numa. So it *may* be that that "only
under Xen PV" is a red herring. But that's just a possible guess.

Christ, how I hate that _PAGE_NUMA bit. Andrea: the fact that it gets
no testing on any normal machines is a major problem. If it was simple
and straightforward and the code was "obviously correct", it wouldn't
be such a problem, but the _PAGE_NUMA code definitely does not fall
under that "simple and obviously correct" heading.

Guys, any ideas?

Linus
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

void die(const char *what)
{
perror(what);
exit(1);
}

int main(int arg, char **argv)
{
void *p =
mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

if (p == MAP_FAILED)
die("mmap");

/* Tickle the page. */
((char *) p)[0] = 0;

if (mprotect(p, 4096, PROT_NONE) != 0)
die("mprotect");

if (mprotect(p, 4096, PROT_READ) != 0)
die("mprotect");

if (munmap(p, 4096) != 0)
die("munmap");

return 0;
}