Re: [PATCH] x86/irq: Preserve vector in orig_ax for APIC code

From: Alexander Graf
Date: Wed Aug 26 2020 - 12:34:40 EST




On 26.08.20 16:27, Thomas Gleixner wrote:

On Wed, Aug 26 2020 at 13:53, Alexander Graf wrote:
Commit 633260fa143 ("x86/irq: Convey vector as argument and not in ptregs")
changed the handover logic of the vector identifier from ~vector in orig_ax
to purely register based. Unfortunately, this field has another consumer
in the APIC code which the commit did not touch. The net result was that
IRQ balancing did not work and instead resulted in interrupt storms, slowing
down the system.

The net result is an observationof the symptom but that does not explain
what the underlying technical issue is.

This patch restores the original semantics that orig_ax contains the vector.
When we receive an interrupt now, the actual vector number stays stored in
the orig_ax field which then gets consumed by the APIC code.

To ensure that nobody else trips over this in the future, the patch also adds
comments at strategic places to warn anyone who would refactor the code that
there is another consumer of the field.

With this patch in place, IRQ balancing works as expected and performance
levels are restored to previous levels.

There's a lot of 'This patch and we' in that changelog. Care to grep
for 'This patch' in Documentation/process/ ?

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df8c017..22e829c 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -727,7 +727,7 @@ SYM_CODE_START_LOCAL(asm_\cfunc)
ENCODE_FRAME_POINTER
movl %esp, %eax
movl PT_ORIG_EAX(%esp), %edx /* get the vector from stack */
- movl $-1, PT_ORIG_EAX(%esp) /* no syscall to restart */
+ /* keep vector on stack for APIC's irq_complete_move() */

Yes that's fixing your observed wreckage, but it introduces a worse one.

user space
-> interrupt
push vector into orig_ax (values are in the ranges of 0-127 and -128 - 255
except for the system vectors which do
not go through this code)
handle()
...
exit_to_user_mode_loop()
arch_do_signal()
/* Did we come from a system call? */
if (syscall_get_nr(current, regs) >= 0) {

---> BOOM for any vector 0-127 because syscall_get_nr()
resolves to regs->orig_ax

Going to be fun to debug.

Hah, that's the code flow I was looking for to understand why the value was negative in the first place. Thanks a lot for pointing it out!


The below nasty hack cures it, but I hate it with a passion. I'll look
deeper for a sane variant.

An alternative (that doesn't make the code easier to read, but would fix the issue at hand) would be touse a pushq imm16 with vector | 0x8000 instead to always make the value negative, no?


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879