Re: [PATCH] perf/x86/intel: Mark expected switch fall-throughs

From: Peter Zijlstra
Date: Thu Jun 27 2019 - 03:11:32 EST


On Wed, Jun 26, 2019 at 03:33:36PM -0700, Nick Desaulniers wrote:
> On Wed, Jun 26, 2019 at 9:31 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Jun 25, 2019 at 11:47:06PM +0200, Thomas Gleixner wrote:
> > > > On Tue, Jun 25, 2019 at 09:53:09PM +0200, Thomas Gleixner wrote:
> >
> > > > > but it also makes objtool unhappy:
> >
> > > > > arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool: get_fixed_ranges()+0x9b: unreachable instruction
> >
> > > I just checked two of them in the disassembly. In both cases it's jump
> > > label related. Here is one:
> > >
> > > asm volatile("1: rdmsr\n"
> > > 410: b9 59 02 00 00 mov $0x259,%ecx
> > > 415: 0f 32 rdmsr
> > > 417: 49 89 c6 mov %rax,%r14
> > > 41a: 48 89 d3 mov %rdx,%rbx
> > > return EAX_EDX_VAL(val, low, high);
> > > 41d: 48 c1 e3 20 shl $0x20,%rbx
> > > 421: 48 09 c3 or %rax,%rbx
> > > 424: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > > 429: eb 0f jmp 43a <get_fixed_ranges+0xaa>
> > > do_trace_read_msr(msr, val, 0);
> > > 42b: bf 59 02 00 00 mov $0x259,%edi <------- "unreachable"
>
> I assume if 0x42b is unreachable, that's bad as $0x259 is never stored
> in %edi before the call to get_fixed_ranges+0xaa...

So what happens is that the __jump_table entry for 424 is wrong. When we
enable that key (read msr tracepoint) the code will jump to another
instance of the read msr tracepoint and continue running from there.

So we'll jump from one inlined instance to another, with all the
ramifications thereof; the code-flow will be completely screwy.

> > > 430: 48 89 de mov %rbx,%rsi
> > > 433: 31 d2 xor %edx,%edx
> > > 435: e8 00 00 00 00 callq 43a <get_fixed_ranges+0xaa>
> > > 43a: 44 89 35 00 00 00 00 mov %r14d,0x0(%rip) # 441 <get_fixed_ranges+0xb1>
> >
> > Thomas provided the actual .o file, and from that we find that the
> > .rela__jump_table entries look like:
> >
> > 000000000010 000100000002 R_X86_64_PC32 0000000000000000 .text + 3e9
> > 000000000014 000100000002 R_X86_64_PC32 0000000000000000 .text + 3f0
> > 000000000018 006100000018 R_X86_64_PC64 0000000000000000 __tracepoint_read_msr + 8
>
> I assume these relocations come from arch_static_branch() (and thus
> appear in triples?)
>
> 21 static __always_inline bool arch_static_branch(struct static_key
> *key, bool branch)
> 22 {
> 23 asm_volatile_goto("1:"
> 24 ".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"

> 25 ".pushsection __jump_table, \"aw\" \n\t"
> 26 _ASM_ALIGN "\n\t"
> 27 ".long 1b - ., %l[l_yes] - . \n\t" // 1, 2
> 28 _ASM_PTR "%c0 + %c1 - .\n\t" // 3
> 29 ".popsection \n\t"

Yes, its lines 25-29 ^ that generate the jump_table entries. The first
entry is the code location, the second is the jump target and the third
is a pointer (and two LSB state bits) to the key this belongs to.

The compiler emits .rela objects for these and this is what we use with
objtool -- just like a linker would to resolve and create the actual
__jump_table section.

> 30 : : "i" (key), "i" (branch) : : l_yes);
>
> > 000000000020 000100000002 R_X86_64_PC32 0000000000000000 .text + 424
> > 000000000024 000100000002 R_X86_64_PC32 0000000000000000 .text + 3f0
> > 000000000028 006100000018 R_X86_64_PC64 0000000000000000 __tracepoint_read_msr + 8
> >
> > From this we find that the jump target that goes with the NOP at +424 is
> > +3f0, not +42b as one would expect.
> >
> > And as Josh noted, it is also 'weird' that this +3f0 is the very same as
> > the target for the previous entry.
>
> (Ok, I think I did talk to Josh about this, and I think he did mention
> something about the jump targets, but I didn't really understand the
> issue well at the time).

So what we have here is two instances of the same read msr inline. They
have code in different offsets (+3e9 and +424 resp.) but somehow the
compiler messed up and collapsed their jump target (or didn't properly
de-duplicate -- I've no idea how inline instantiation actually works).

> >
> > When we compare the code at both sites, we find:
> >
> > 3f0: bf 58 02 00 00 mov $0x258,%edi
> > 3f5: 48 89 de mov %rbx,%rsi
> > 3f8: 31 d2 xor %edx,%edx
> > 3fa: e8 00 00 00 00 callq 3ff <get_fixed_ranges+0x6f>
> > 3fb: R_X86_64_PC32 do_trace_read_msr-0x4
> >
> > vs
> >
> > 42b: bf 59 02 00 00 mov $0x259,%edi
> > 430: 48 89 de mov %rbx,%rsi
> > 433: 31 d2 xor %edx,%edx
> > 435: e8 00 00 00 00 callq 43a <get_fixed_ranges+0xaa>
> > 436: R_X86_64_PC32 do_trace_read_msr-0x4
> >
> > Which is not in fact the same code.
> >
> > So for some reason the .rela__jump_table are buggy on this clang build.
>
> So that sounds like a correctness bug then.

Yes, this is very very very bad. Like I wrote above, this results in
code flow moving from one inline'd instance into another, it completely
wrecks code flow integrity.

> (I'd been doing testing
> with the STATIC_KEYS_SELFTEST, which I guess doesn't expose this).
> I'm kind of surprised we can boot and pass STATIC_KEYS_SELFTEST. Any
> way you can help us pare down a test case?

It looks like an inlining bug, and I a rare one at that. The static key
self-tests simply don't trigger this for whatever reason. Like Thomas
wrote, of all the jump_labels in the kernel, only 6 go 'funny' for
whatever reason.