Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration

From: Linus Torvalds
Date: Tue Nov 14 2017 - 12:31:38 EST


On Tue, Nov 14, 2017 at 9:10 AM, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>> (* OPTION 1 *)
>> Store modified code (as data) into code segment;
>> Jump to new code or an intermediate location;
>> Execute new code;"
>
> Good point, so this is likely why I was having trouble reproducing the
> single-threaded self-modifying code incoherent case. I did have a branch
> in there.

Actually, even *without* the branch, Intel has been very good at
having precise I$ coherency. I think uou can literally store to the
next instruction, and Intel CPU's after the Pentium Pro would notice,
take a micro-fault, and handle it correctly (the i486 and Pentium did
not have that level of coherency, but a taken branch would flush the
fetch buffer).

An in-order Atom probabably has the old Pentium behavior, and you
could see it there.

But starting with the P6, and OoO execution, the "taken branch" thing
meant very little, so Intel started instead just doing the
"store-vs-instruction fetch" coherency explicitly, which causes it to
be precise.

Afaik, the only way to show incoherent I$ fairly easily is to use
virtual aliasing, and store to a different virtual address, because
the fetch buffer coherency is done by virtual address.

But even then, it's only the fetch buffer (and it's been called
different things over the years, now it's a uop loop cache), not the
L1 caches, so you get a very limited window of instructions.

And that fetch buffer is also where any cross-cpu incoherency would
be, for the exact same reason.

Linus