Re: [PATCH] perf/x86/intel: Mark expected switch fall-throughs

From: Peter Zijlstra
Date: Thu Jun 27 2019 - 03:36:16 EST


On Wed, Jun 26, 2019 at 03:23:24PM -0700, Nick Desaulniers wrote:
> On Wed, Jun 26, 2019 at 2:55 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Jun 26, 2019 at 11:24:32AM +0200, Peter Zijlstra wrote:
> > > That's pretty atrocious code-gen :/
> >
> > And I know nobody reads comments (I don't either), but I did write one
> > on this as it happens.
>
> I've definitely read that block in include/linux/jump_label.h; can't
> say I fully understand it yet, but appreciate patience and
> explanations.

So the relevant bits are:

* type\branch| likely (1) | unlikely (0)
* -----------+-----------------------+------------------
* | |
* true (1) | ... | ...
* | NOP | JMP L
* | <br-stmts> | 1: ...
* | L: ... |
* | |
* | | L: <br-stmts>
* | | jmp 1b
* | |
* -----------+-----------------------+------------------
* | |
* false (0) | ... | ...
* | JMP L | NOP
* | <br-stmts> | 1: ...
* | L: ... |
* | |
* | | L: <br-stmts>
* | | jmp 1b
* | |
* -----------+-----------------------+------------------

So we have two types, static_key_true, which defaults to true and
static_key_false, which defaults (unsurprisingly) to false. At runtime
they can be switched at will, it is just the initial state which
determines what code we actually need to emit at compile time.

And we have two statements: static_branch_likely(), the branch is likely
-- or we want the block in-line, and static_branch_unlikely(), the
branch is unlikely -- or we want the block out-of-line.

This is coded like:

#define static_branch_likely(x) \
({ \
bool branch; \
if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
branch = !arch_static_branch(&(x)->key, true); \
else if (__builtin_types_compatible_p(typeof(*x), struct static_key_false)) \
branch = !arch_static_branch_jump(&(x)->key, true); \
else \
branch = ____wrong_branch_error(); \
likely(branch); \
})

#define static_branch_unlikely(x) \
({ \
bool branch; \
if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
branch = arch_static_branch_jump(&(x)->key, false); \
else if (__builtin_types_compatible_p(typeof(*x), struct static_key_false)) \
branch = arch_static_branch(&(x)->key, false); \
else \
branch = ____wrong_branch_error(); \
unlikely(branch); \
})

Let's walk through static_branch_unlikely() (the other is very similar,
just reversed).

We use __builtin_types_compatible_p() to compile-time detect which key
type is used, such that we can emit the right initial code:

- static_key_true; we must emit a JMP to the block,
- static_key_false; we must emit a NOP and not execute the block.
- neither; we generate a link error.

Then we take the return value and use __builtin_expect(, 0) on it to
influence the block layout, specifically we want the block to be
out-of-line.

It appears the __builtin_expect() usage isn't working right with LLVM
resuling in that layout issue Thomas spotted. GCC8+ can even place them
in the .text.unlikely section as func.cold.N parts/symbols. But the main
point is to get the block away from the normal I$ content to minimize
impact.