RE: [PATCH v5 1/5] x86/asm: Carve out a generic movdir64b() helper for general usage

From: David Laight
Date: Thu Sep 24 2020 - 06:42:26 EST


From: Borislav Petkov
> Sent: 24 September 2020 11:15
> On Thu, Sep 24, 2020 at 08:24:46AM +0000, David Laight wrote:
> > static inline void movdir64b(void *dst, const void *src)
> > {
> > /*
> > * 64 bytes from dst are marked as modified for completeness.
> > * Since the writes bypass the cache later reads may return
> > * old data anyway.
> > */
> > /* MOVDIR64B [rdx], rax */
> > asm volatile (".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
> > : "=m" ((struct { char _[64];} *)dst),
> > : "m" ((struct { char _[64];} *)src), "d" (src), "a" (dst));
>
> Now since you're so generous with your advice on random threads, please
> explain what you're advising here?
>
> The destination operand - in this case in %rax - is "destination memory
> address specified as offset to ES segment in the register operand."

The movdir64b instruction does a 'normal' read of 64 bytes (can be misaligned)
Then a cache-bypassing (probably) write-combining single 64byte write to
an address that must be aligned.
Any reference to segment registers is largely irrelevant since we are
not in real mode.


> So what is the difference between:
>
> ...(void *dst, ... )
>
> volatile struct { char _[64]; } *__dst = dst;
> ...
> : "=m" (__dst)
> : "a" (__dst)
>
> and
>
> ...(void *dst, ... )
> ...
> : "=m" ((struct { char _[64];} *)dst)
> : "a" (__dst)
>
> and why?
>
> Point me to the gcc documentation where this is explained.

Mainly less lines of code to look at.

> To cut to the chase, I don't think you need to do that, otherwise clwb()
> would be broken too but perhaps you know something I don't.
>
> Looking at clwb(), I believe the proper specification should be:
>
> volatile struct { char _[64]; } *__dst = dst;
>
> ...
>
> : "+m" (__dst)
> : "a" (__dst)

No idea what clwb() is doing.
But the "+m" (dst) tells gcc it depends on, and modifies the 64 bytes
at *dst.

I believe the 'volatile' is pointless.

> And if anything, the source specification should be something like that:
>
> volatile struct { char x[64]; } *__src = src;
>
> ...
>
>
> "d" (__src)
>
> because this tells gcc that the source operand would read 64 bytes
> through the pointer in the %rdx reg.

No, that just says the asm uses the value of the pointer.
Not what it points to.

> So this ends up close to what you're saying but it is using local
> variables to make the asm actually readable.
>
> Lemme add Micha to Cc for sanity-checking:
>
> Micha, the instruction is:
>
> MOVDIR64B %(rdx), rax
>
> "Move 64-bytes as direct-store with guaranteed 64-byte write atomicity
> from the source memory operand address to destination memory address
> specified as offset to ES segment in the register operand."
>
> Do I need to tell gcc that both operands are referencing 64 bytes,
> source operand is a memory reference, destination operand is an address
> specified in a register?
>
> What we have currently is:
>
> volatile struct { char _[64]; } *dst = __dst;
>
> /* MOVDIR64B [rdx], rax */
> asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
> : "=m" (dst)
> : "d" (from), "a" (dst));

That is wrong.
Feed this into cc -S -O2 and look at the .s file

static inline void movdir64b(void *dst, const void *src)
{
asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
:
: /*"m" ((struct { char _[64];} *)src),*/ "d" (src), "a" (dst)
);

void foo(void *dst, int val)
{
long b64[8] = { 0 };

b64[0] = val;
movdir64b(dst, b64);
}

Note that all to code that writes into b64[] is optimised away.
Repeat after uncommenting the "m" constraint and spot the difference.

The "=m" (dst) constraint is much less important here.
The write itself will always happen.
So do we need to tell gcc we did it?
Doing so just ensures gcc doesn't move any instructions that it knows
access the same memory above the movdir64b instruction.
But, because this is a cache bypassing write they are going
to be invalid anyway - without extra strong barriers.
So it is fairly safe to miss it out.
OTOH putting it in does no harm and helps annotate what the
instruction is doing.

I just failed to spot an example of a 'memory size' cast in the
kernel source tree - I'm sure there is an example somewhere.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)