Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

From: Andy Lutomirski
Date: Tue Aug 18 2015 - 19:08:17 EST


On Tue, Aug 18, 2015 at 4:02 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> On Tue, Aug 18, 2015 at 03:35:30PM -0700, Andy Lutomirski wrote:
>> On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
>> > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
>> >> This fixes a couple minor holes if we took an IRQ very early in syscall
>> >> processing:
>> >>
>> >> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU
>> >> was fine), but we could warn if all the debugging options were
>> >> set.
>> >
>> > So this is fixing issues after your changes that call user_exit() from
>> > IRQs, right?
>>
>> Yes. Here's an example splat, courtesy of Sasha:
>>
>> https://gist.github.com/sashalevin/a006a44989312f6835e7
>>
>> >
>> > But the IRQs aren't supposed to call user_exit(), they have their own hooks.
>> > That's where the real issue is.
>>
>> In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
>> when entering the kernel for a non-NMI reason.
>
> Why? IRQs don't need that! We already have irq_enter()/irq_exit().
>

Those are certainly redundant. I want to have a real hook to call
that says "switch to IRQ context from CONTEXT_USER" or "switch to IRQ
context from CONTEXT_KERNEL" (aka noop), but that doesn't currently
exist.

> And we don't want to call rcu_user_*() pairs on IRQs, you're
> introducing a serious performance regression here! And I'm talking about
> the code that's currently in -tip.

Is there an easy way to fix it? For example, could we figure out what
makes it take so long and make it faster? If we need to, we could
back out the IRQ bit and change the assertions for 4.3, but I'd rather
keep the exact context tracking if at all possible.

>
>> That means that we can
>> avoid all of the (expensive!) checks for what context we're in.
>
> If you're referring to context tracking, the context check is a per-cpu
> read. Not something that's usually considered expensive.

In -tip, there aren't even extra branches, except those imposed by the
user_exit implementation.

>
>> It also means that (other than IRQs, which need further cleanup), we only
>> switch once per user/kernel switch.
>
> ???

In 4.2 and before, we can switch multiple times on the way out of the
kernel, via SCHEDULE_USER, do_notify_resume, etc. In -tip, we do it
exactly once no matter what.

>
>>
>> The cost for doing should be essentially zero, modulo artifacts from
>> poor inlining.
>
> And modulo rcu_user_*() that do multiple costly atomic_add_return() operations
> implying full memory barriers. Plus the unnecessary vtime accounting that doubles
> the existing one in irq_enter/exit() (those even imply a lock currently, which will
> probably be turned to seqcount, but still, full memory barriers...).
>
> I'm sorry but I'm going to NACK any code that does that in IRQs (and again that
> concerns current tip:x86/asm).

Why do we need these heavyweight barriers?

If there's actually a measurable performance hit in IRQs in -tip, then
can we come up with a better fix? For example, we could change all
the new CT_WARN_ON calls to check "are we in CONTEXT_KERNEL or in IRQ
context" and make the IRQ entry do a lighter weight context tracking
operation.

But I think I'm still missing something fundamental about the
performance: why is irq_enter() any faster than user_exit()?

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/