Re: [PATCH v2 5/6] x86/xen: Add a Xen-specific sync_core() implementation

From: Linus Torvalds
Date: Fri Dec 02 2016 - 12:32:50 EST


On Thu, Dec 1, 2016 at 4:35 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> On my laptop, CPUID(eax=1, ecx=0) is ~83ns and IRET-to-self is
> ~110ns. But Xen PV will trap CPUID if possible, so IRET-to-self
> should end up being a nice speedup.

So if we care deeply about the performance of this, we should really
ask ourselves how much we need this...

There are *very* few places where we really need to do a full
serializing instruction, and I'd worry that we really don't need it in
many of the places we do this.

The only real case I'm aware of is modifying code that is modified
through a different linear address than it's executed.

Is there anything else where we _really_ need this sync-core thing?
Sure, the microcode loader looks fine, but that doesn't look
particularly performance-critical either.

So I'd like to know which sync_core is actually so
performance-critical that w e care about it, and then I'd like to
understand why it's needed at all, because I suspect a number of them
has been added with the model of "sprinkle random things around and
hope".

Looking at ftrace, for example, which is one of the users, does it
actually _really_ need sync_core() at all? It seems to use the kerrnel
virtual address, and then the instruction stream will be coherent.

Adding Peter Anvin to the participants list, because iirc he was the
one who really talked to hardwre engineers about the synchronization
issues with serializing kernel code.

Linus