Re: static_branch/jump_label vs branch merging

From: Peter Zijlstra
Date: Fri Apr 09 2021 - 16:10:35 EST


On Fri, Apr 09, 2021 at 03:21:49PM -0400, David Malcolm wrote:
> [Caveat: I'm a gcc developer, not a kernel expert]
>
> But it's not *quite* a global constant, or presumably you would be
> simply using a global constant, right? As the optimizer gets smarter,
> you don't want to have it one day decide that actually it really is
> constant, and optimize away everything at compile-time (e.g. when LTO
> is turned on, or whatnot).

Right; as I said, the result is not a constant, but any invocation ever,
will return the same result. Small but subtle difference :-)

> I get the impression that you're resorting to assembler because you're
> pushing beyond what the C language can express.

Of course :-) I tend to always push waaaaay past what C considers sane.
Lets say I'm firmly in the C-as-Optimizing-Assembler camp :-)

> Taking things to a slightly higher level, am I right in thinking that
> what you're trying to achieve is a control flow construct that almost
> always takes one of the given branches, but which can (very rarely) be
> switched to permanently take one of the other branches, and that you
> want the lowest possible overhead for the common case where the
> control flow hasn't been touched yet?

Correct, that's what it is. We do runtime code patching to flip the
branch if/when needed. We've been doing this for many many years now.

The issue of today is all this clever stuff defeating some simple
optimizations.

> (and presumably little overhead for when it
> has been?)... and that you want to be able to merge repeated such
> conditionals.

This.. So the 'static' branches have been upstream and in use ever since
GCC added asm-goto, it was in fact the driving force to get asm-goto
implemented. This was 2010 according to git history.

So we emit, using asm goto, either a "NOP5" or "JMP.d32" (x86 speaking),
and a special section entry into which we encode the key address and the
instruction address and the jump target.

GCC, not knowing what the asm does, only sees the 2 edges and all is
well.

Then, at runtime, when we decide we want the other edge for a given key,
we iterate our section and rewrite the code to either nop5 or jmp.d32
with the correct jump target.

> It's kind of the opposite of "volatile" - something that the user is
> happy for the compiler to treat as not changing much, as opposed to
> something the user is warning the compiler about changing from under
> it. A "const-ish" value?

Just so. Encoded in text, not data.

> Sorry if I'm being incoherent; I'm kind of thinking aloud here.

No problem, we're way outside of what is generally considered normal,
and I did somewhat assume people were familiar with our 'dodgy'
construct (some on this list are more than others).

I hope it's all a little clearer now.