Re: [Regression full nohz] [PATCH] x86: Don't call context tracking APIs on IRQs

From: Andy Lutomirski
Date: Tue Oct 13 2015 - 12:31:40 EST


On Tue, Oct 13, 2015 at 9:01 AM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> I did rant about this before the merge window but this got basically ignored,
> as all my concerns about x86 context tracking calls that are now based on
> regs and not context tracking internal states, making it more fragile.
>
> Don't get me wrong, I love this x86 entry code rework but please don't
> ignore other's concerns.
>
> Yes we could optimize IRQ context tracking calls by pulling them on
> low level IRQ code, but only if we manage to spare the current calls
> on irq_enter/irq_exit. Otherwise they just double the context tracking
> calls and result in avoidable overhead.
>
>
> ---
> From: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Date: Sat, 3 Oct 2015 01:18:09 +0200
> Subject: [PATCH] x86: Don't call context tracking APIs on IRQs
>
> IRQs already call irq_enter() and irq_exit() which take care of RCU
> and vtime needs. There is no need to call user_enter() / user_exit()
> on IRQs except on IRQ exit time if we schedule out or handle signals.
>
> This may result in performance regression when context tracking is
> enabled, not to mention that enter_from_user_mode() is called all the
> time on IRQ entry when CONFIG_CONTEXT_TRACKING=y (which is enabled on
> many distros) even though context tracking is actually not running,
> breaking the static key optimizations.

I would be rather surprised if that makes a significant difference.
It's a call to a function that should be very short right after an
exceedingly heavy serializing operation (namely interrupt delivery).
Is this easy to benchmark?

A nicer approach IMO would be to fix the static key optimizations.

>
> This could be optimized with pulling irq_enter/exit to low level irq
> code but that requires more thoughts.
>

Given the choice, I'd much rather do it that way.

FWIW, we're extremely close to being able to say (and verify at
runtime!) that any kernel code with IRQs on is *always* tracked as
kernel mode. Unless I've missed something, the single remaining
exception is the 64-bit native syscall entry path. Once that's done,
we can drop exception_enter entirely and we can drop the
irq_save/irq_restore from the context tracking code itself. (Rik
thinks that the save/restore is a fairly large fraction of the total
overhead in the context-tracking-enabled case.)

This patch is a step back from that.

> Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> ---
> arch/x86/entry/common.c | 11 ++++++++++-
> arch/x86/entry/entry_64.S | 11 ++++++-----
> 2 files changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 80dcc92..1b7a866 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -229,6 +229,7 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
> * work to clear some of the flags can sleep.
> */
> while (true) {
> + enum ctx_state prev_state;
> u32 cached_flags =
> READ_ONCE(pt_regs_to_thread_info(regs)->flags);
>
> @@ -237,8 +238,10 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
> _TIF_USER_RETURN_NOTIFY)))
> break;
>
> +
> /* We have work to do. */
> local_irq_enable();
> + prev_state = exception_enter();

This loop is IMO fairly gross. We enter prepare_exit_to_usermode from
a context that is either tracked as kernel mode or that was tracked as
kernel mode and then switched back early (due to irq_exit or
whatever), and now we run a loop and switch back and forth? I had
hoped that this switching back and forth in a loop would stay dead
after I killed it :)

>
> if (cached_flags & _TIF_NEED_RESCHED)
> schedule();
> @@ -258,10 +261,16 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
> if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> fire_user_return_notifiers();
>
> + exception_exit(prev_state);
> +
> /* Disable IRQs and retry */
> local_irq_disable();
> }
> +}
>
> +__visible void prepare_exit_to_usermode_track(struct pt_regs *regs)
> +{
> + prepare_exit_to_usermode(regs);
> user_enter();
> }
>

This change here seems likely to break things. All of the new code
assumes that it's okay to call prepare_exit_to_usermode and then
immediately go back to user mode.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/