Re: static_branch/jump_label vs branch merging

From: David Malcolm
Date: Fri Apr 09 2021 - 17:07:31 EST


On Fri, 2021-04-09 at 22:09 +0200, Peter Zijlstra wrote:
> On Fri, Apr 09, 2021 at 03:21:49PM -0400, David Malcolm wrote:
> > [Caveat: I'm a gcc developer, not a kernel expert]
> >
> > But it's not *quite* a global constant, or presumably you would be
> > simply using a global constant, right?  As the optimizer gets
> > smarter,
> > you don't want to have it one day decide that actually it really is
> > constant, and optimize away everything at compile-time (e.g. when
> > LTO
> > is turned on, or whatnot).
>
> Right; as I said, the result is not a constant, but any invocation
> ever,
> will return the same result. Small but subtle difference :-)
>
> > I get the impression that you're resorting to assembler because
> > you're
> > pushing beyond what the C language can express.
>
> Of course :-) I tend to always push waaaaay past what C considers
> sane.
> Lets say I'm firmly in the C-as-Optimizing-Assembler camp :-)

Yeah, I got that :)

> > Taking things to a slightly higher level, am I right in thinking
> > that
> > what you're trying to achieve is a control flow construct that
> > almost
> > always takes one of the given branches, but which can (very rarely)
> > be
> > switched to permanently take one of the other branches, and that
> > you
> > want the lowest possible overhead for the common case where the
> > control flow hasn't been touched yet?
>
> Correct, that's what it is. We do runtime code patching to flip the
> branch if/when needed. We've been doing this for many many years now.
>
> The issue of today is all this clever stuff defeating some simple
> optimizations.

It's certainly clever - though, if you'll forgive me, that's not always
a good thing :)

> > (and presumably little overhead for when it
> > has been?)... and that you want to be able to merge repeated such
> > conditionals.
>
> This.. So the 'static' branches have been upstream and in use ever
> since
> GCC added asm-goto, it was in fact the driving force to get asm-goto
> implemented. This was 2010 according to git history.
>
> So we emit, using asm goto, either a "NOP5" or "JMP.d32" (x86
> speaking),
> and a special section entry into which we encode the key address and
> the
> instruction address and the jump target.
>
> GCC, not knowing what the asm does, only sees the 2 edges and all is
> well.
>
> Then, at runtime, when we decide we want the other edge for a given
> key,
> we iterate our section and rewrite the code to either nop5 or jmp.d32
> with the correct jump target.
>
> > It's kind of the opposite of "volatile" - something that the user
> > is
> > happy for the compiler to treat as not changing much, as opposed to
> > something the user is warning the compiler about changing from
> > under
> > it.  A "const-ish" value?
>
> Just so. Encoded in text, not data.
>
> > Sorry if I'm being incoherent; I'm kind of thinking aloud here.
>
> No problem, we're way outside of what is generally considered normal,
> and I did somewhat assume people were familiar with our 'dodgy'
> construct (some on this list are more than others).
>
> I hope it's all a little clearer now.

Yeah. This is actually on two mailing lists; I'm only subscribed to
linux-toolchains, which AIUI is about sharing ideas between Linux and
the toolchains.

You've built a very specific thing out of asm-goto to fulfil the tough
requirements you outlined above - as well as the nops, there's a thing
in another section to contend with.

How to merge these asm-goto constructs?

Doing so feels very special-case to the kernel and not something that
other GCC users would find useful.

I can imagine a GCC plugin that implemented a custom optimization pass
for that - basically something that spots the asm-gotos in the gimple
IR and optimizes away duplicates by replacing them with jumps, but
having read about Linus's feelings about GCC plugins recently:
https://lwn.net/Articles/851090/
I suspect that that isn't going to fly (and if you're going down the
route of adding an optimization pass via a plugin, there's probably a
way to do that that doesn't involve asm). In theory, something to
optimize the asm-gotos could be relatively simple, but that said, we
don't really have a GCC plugin API; all of our internal APIs are
exposed, and are liable to change from release to release, which I know
is a pain (I've managed to break one of my own plugins with one of my
own API changes at least once).

Hope this is constructive
Dave