[Regression full nohz] [PATCH] x86: Don't call context tracking APIs on IRQs

From: Frederic Weisbecker
Date: Tue Oct 13 2015 - 12:01:58 EST


I did rant about this before the merge window but this got basically ignored,
as all my concerns about x86 context tracking calls that are now based on
regs and not context tracking internal states, making it more fragile.

Don't get me wrong, I love this x86 entry code rework but please don't
ignore other's concerns.

Yes we could optimize IRQ context tracking calls by pulling them on
low level IRQ code, but only if we manage to spare the current calls
on irq_enter/irq_exit. Otherwise they just double the context tracking
calls and result in avoidable overhead.


---
From: Frederic Weisbecker <fweisbec@xxxxxxxxx>
Date: Sat, 3 Oct 2015 01:18:09 +0200
Subject: [PATCH] x86: Don't call context tracking APIs on IRQs

IRQs already call irq_enter() and irq_exit() which take care of RCU
and vtime needs. There is no need to call user_enter() / user_exit()
on IRQs except on IRQ exit time if we schedule out or handle signals.

This may result in performance regression when context tracking is
enabled, not to mention that enter_from_user_mode() is called all the
time on IRQ entry when CONFIG_CONTEXT_TRACKING=y (which is enabled on
many distros) even though context tracking is actually not running,
breaking the static key optimizations.

This could be optimized with pulling irq_enter/exit to low level irq
code but that requires more thoughts.

Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
---
arch/x86/entry/common.c | 11 ++++++++++-
arch/x86/entry/entry_64.S | 11 ++++++-----
2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 80dcc92..1b7a866 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -229,6 +229,7 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
* work to clear some of the flags can sleep.
*/
while (true) {
+ enum ctx_state prev_state;
u32 cached_flags =
READ_ONCE(pt_regs_to_thread_info(regs)->flags);

@@ -237,8 +238,10 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
_TIF_USER_RETURN_NOTIFY)))
break;

+
/* We have work to do. */
local_irq_enable();
+ prev_state = exception_enter();

if (cached_flags & _TIF_NEED_RESCHED)
schedule();
@@ -258,10 +261,16 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
if (cached_flags & _TIF_USER_RETURN_NOTIFY)
fire_user_return_notifiers();

+ exception_exit(prev_state);
+
/* Disable IRQs and retry */
local_irq_disable();
}
+}

+__visible void prepare_exit_to_usermode_track(struct pt_regs *regs)
+{
+ prepare_exit_to_usermode(regs);
user_enter();
}

@@ -314,5 +323,5 @@ __visible void syscall_return_slowpath(struct pt_regs *regs)
#endif

local_irq_disable();
- prepare_exit_to_usermode(regs);
+ prepare_exit_to_usermode_track(regs);
}
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 055a01d..f10b2c4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -513,10 +513,6 @@ END(irq_entries_start)
* tracking that we're in kernel mode.
*/
SWAPGS
-#ifdef CONFIG_CONTEXT_TRACKING
- call enter_from_user_mode
-#endif
-
1:
/*
* Save previous stack pointer, optionally switch to interrupt stack.
@@ -1123,7 +1119,12 @@ ENTRY(error_exit)
TRACE_IRQS_OFF
testl %eax, %eax
jnz retint_kernel
- jmp retint_user
+ /* like retint_user with the call to context tracking */
+ mov %rsp,%rdi
+ call prepare_exit_to_usermode_track
+ TRACE_IRQS_IRETQ
+ SWAPGS
+ jmp restore_regs_and_iret
END(error_exit)

/* Runs on exception stack */
--
2.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/