Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might_sleep

From: Linus Torvalds
Date: Wed Nov 19 2014 - 19:37:28 EST


On Wed, Nov 19, 2014 at 4:13 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
> No drugs, just imprecision. This series doesn't change NMI handling
> at all. It only changes machine_check int3, debug, and stack_segment.
> (Why is #SS using IST stacks anyway?)

.. ok, we were talking about adding an explicit preemption count to
nmi, and then you wanted to make that conditional, that kind of
freaked me out.

> So my point stands: if machine_check is going to be conditionally
> atomic, then that condition needs to be expressed somewhere.

I'd still prefer to keep that knowledge in one place, rather than
adding *another* completely ad-hoc thing in addition to what we
already have.

Also, I really don't think it should be about the particular stack
you're using. Sure, if a debug fault happens in user space, the fault
handler could sleep if it runs on the regular stack, but our
"might_sleep()" are about catching things that *could* be problematic,
even if the sleep never happens. And so, might_sleep() _should_
actually trigger, even if it's not using the IST stack, because *if*
the debug exception happened in kernel space, then we should warn.

So I'd actually *prefer* to have special hacks that perhaps then
"undo" the preemption count if the code expressly tests for "did this
happen in user space, then I know I'm safe". But then it's an
*explicit* thing, not something that just magically works because
nobody even thought about it, and the trap happened in user space.

See the argument? I'd *rather* see code like

/* Magic */
if (user_mode(regs)) {
.. verify that we're using the normal kernel stack
.. enable interrupts, enable preemption
.. this is the explicit special case and it is aware
.. of being special
}

even if on the face of it it looks hacky. But an *explicit* hack is
preferable to something that just "happens" to work only for the
user-mode case.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/