Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch

From: hpa
Date: Tue Feb 19 2019 - 04:26:19 EST


On February 19, 2019 1:04:09 AM PST, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>On Mon, Feb 18, 2019 at 02:30:21PM -0800, H. Peter Anvin wrote:
>> On 2/16/19 2:30 AM, Peter Zijlstra wrote:
>> > On Fri, Feb 15, 2019 at 08:06:56PM -0800, hpa@xxxxxxxxx wrote:
>> >> This implies we invoke schedule -- a restricted operation
>(consider
>> >> may_sleep) during execution of STAC-enabled code, but *not* as an
>> >> exception or interrupt, since those preserve the flags.
>> >
>> > Meet preempt_enable().
>>
>> I believe this falls under "doctor, it hurts when I do that." And it
>hurts for
>> very good reasons. See below.
>
>I disagree; the basic rule is that if you're preemptible you must also
>be able to schedule and vice-versa. These AC regions violate this.
>
>And, like illustrated, it is _very_ easy to get all sorts inside that
>AC
>crap.
>
>> >> I have serious concerns about this. This is more or less saying
>that
>> >> we have left an unlimited gap and have had AC escape.
>> >
>> > Yes; by allowing normal C in between STAC and CLAC this is so.
>> >
>> >> Does *anyone* see a need to allow this? I got a question at LPC
>from
>> >> someone about this, and what they were trying to do once all the
>> >> layers had been unwound was so far down the wrong track for a root
>> >> problem that actually has a very simple solution.
>> >
>> > Have you read the rest of the thread?
>> >
>> > All it takes for this to explode is a call to a C function that has
>> > tracing on it in between the user_access_begin() and
>user_access_end()
>> > things. That is a _very_ easy thing to do.
>> >
>> > Heck, GCC is allowed to generate that broken code compiling
>> > lib/strnlen_user.c; it doesn't _have_ to inline do_strnlen_user
>(esp.
>> > with CONFIG_OPTIMIZE_INLINING), and making that a function call
>would
>> > get us fentry hooks and ftrace and *BOOM*.
>> >
>> > (Now, of course, its a static function with a single caller, and
>GCC
>> > isn't _THAT_ stupid, but it could, if it wanted to)
>> >
>> > Since the bar is _that_ low for mistakes, I figure we'd better fix
>it.
>> >
>>
>> The question is what "fix it" means.
>
>It means that when we do schedule, the next task doesn't have AC set,
>and when we schedule back, we'll restore our AC when we had it. Unlike
>the current situation, where the next task will run with AC set.
>
>IOW, it confines AC to the task context where it was set.
>
>> I'm really concerned about AC escapes,
>> and everyone else should be, too.
>
>Then _that_ should be asserted.
>
>> For an issue specific to tracing, it would be more appropriate to do
>the
>> appropriate things in the tracing entry/exit than in schedule.
>Otherwise, we
>> don't want to silently paper over mistakes which could mean that we
>run a
>> large portion of the kernel without protection we thought we had.
>>
>> In that sense, calling schedule() with AC set is in fact a great
>place to have
>> a WARN() or even BUG(), because such an event really could imply that
>the
>> kernel has been security compromised.
>
>It is not specific to tracing, tracing is just one of the most trivial
>and non-obvious ways to make it go splat.
>
>There's lot of fairly innocent stuff that does preempt_disable() /
>preempt_enable(). And just a warning in schedule() isn't sufficient,
>you'd have to actually trigger a reschedule before you know your code
>is
>bad.
>
>And like I said; the invariant is: if you're preemptible you can
>schedule and v.v.
>
>Now, clearly you don't want to mark these whole regions as
>!preemptible,
>because then you can also not take faults, but somehow you're not
>worried about the whole fault handler, but you are worried about the
>scheduler ?!? How does that work? The fault handler can touch a _ton_
>more code. Heck, the fault handler can schedule.
>
>So either pre-fault, and run the whole AC crap with preemption disabled
>and retry, or accept that we have to schedule.
>
>> Does that make more sense?
>
>It appears to me you're going about it backwards.

I'm not worried about the fault handler, because AC is always presented/disabled on exception entry; otherwise I would of course be very concerned.

To be clear: I'm not worried about the scheduler itself. I'm worried about how much code we have gone through to get there. Perhaps the scheduler itself is not the right point to probe for it, but we do need to catch things that have gone wrong, rather than just leaving the door wide open.

I would personally be far more comfortable saying you have to disable preemption in user access regions. It may be an unnecessary constraint for x86 and ARM64, but it is safe.

And Julien, yes, it is "somewhat arbitrary", but so are many engineering tradeoffs. Not everything has a very sharply definable line.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.