Re: [patch 1/2] x86_64 page fault NMI-safe

From: Ingo Molnar
Date: Wed Jul 14 2010 - 14:47:31 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> Ok. I was wondering why anybody would allocate core percpu variables so late
> that this would ever be an issue, but I guess perf is a reasonable such
> case. And reasonable to do from NMI.

Yeah.

Frederic (re-)discovered this problem via very hard to debug crashes when he
extended perf call-graph tracing to have a bit larger buffer and used
percpu_alloc() for it (which is entirely reasonable in itself).

> That said - grr. I really wish there was some other alternative than adding
> yet more complexity to the exception return path. That "iret re-enables
> NMI's unconditionally" thing annoys me.

Ok. We can solve it by allocating the space from the non-vmalloc percpu area -
8K per CPU.

> In fact, I wonder if we couldn't just do a software NMI disable
> instead? Hav ea per-cpu variable (in the _core_ percpu areas that get
> allocated statically) that points to the NMI stack frame, and just
> make the NMI code itself do something like
>
> NMI entry:

I think at this point [NMI re-entry] we've corrupted the top of the NMI kernel
stack already, due to entering via the IST stack mechanism, which is
non-nesting and which enters at the same point - right?

We could solve that by copying that small stack frame off before entering the
'generic' NMI routine - but it all feels a bit pulled in by the hair.

I feel uneasy about taking pagefaults from the NMI handler. Even if we
implemented it all correctly, who knows what CPU erratas are waiting there to
be discovered, etc ...

I think we should try to muddle through by preventing these situations from
happening (and adding a WARN_ONCE() to the vmalloc page-fault handler would
certainly help as well), and only go to more clever schemes if no other option
looks sane anymore?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/