Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Linus Torvalds
Date: Wed Oct 11 2023 - 15:52:22 EST


On Wed, 11 Oct 2023 at 11:42, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> The attached patch was tested on a target with fsgsbase CPUID and
> without it. It works!

.. I should clearly read all my emails before answering some of them.

Yes, that patch looks good to me, and I'm happy to hear that you
actually tested it unlike my "maybe something like this".

> The patch improves amd_pmu_enable_virt() in the same way as reported
> in the original patch submission and also reduces the number of percpu
> offset reads (either from this_cpu_off or with rdgsbase) from 1663 to
> 1571.

Dio y ou have any actka performance numbers? The patch looks good to
me, and I *think* rdgsbase ends up being faster in practice due to
avoiding a memory access, but that's very much a gut feel.

> The only drawback is a larger binary size:
>
> text data bss dec hex filename
> 25546594 4387686 808452 30742732 1d518cc vmlinux-new.o
> 25515256 4387814 808452 30711522 1d49ee2 vmlinux-old.o
>
> that increases by 31k (0.123%), probably due to 1578 rdgsbase alternatives.

I'm actually surprised that it increases the text size. The 'rdgsbase'
instruction should be smaller than a 'mov %gs', so I would have
expected the *data* size to increase due to the alternatives tables,
but not the text size.

[ Looks around ]

Oh. It's because we put the altinstructions into the text section.
That's kind of silly, but whatever.

So I think that increase in text-size is not "real" - yes, it
increases our binary size because we obviously have two instructions,
but the actual *executable* part likely stays the same, and it's just
that we grow the altinstruction metadata.

Linus