Re: [patch 4/7] Immediate Values - i386 Optimization
From: Mathieu Desnoyers
Date: Wed Sep 19 2007 - 14:22:31 EST
* Jeremy Fitzhardinge (jeremy@xxxxxxxx) wrote:
> H. Peter Anvin wrote:
> > Mathieu Desnoyers wrote:
> >
> >> Ok, let's have a good look at what we want:
> >>
> >> 1 - get a pointer to the beginning of the immediate value within the
> >> instruction.
> >> 2 - make sure that the immediate value, within the instruction, is
> >> written to atomically wrt all CPUs, even on older architectures
> >> where non aligned writes are not atomic.
> >>
> >>
> >
> > I think you'll find that even on modern architectures cross-cacheline
> > writes aren't atomic.
> >
>
> Cross-cache-line, sure. But what about just not sizeof aligned? If its
> enough to avoid cross-cache-line, then that's simpler.
>
Being sizeof aligned on a cache-line (e.g. 32 bytes boundaries) is a
superset of being aligned on sizeof multiples (e.g. 4 bytes). Therefore,
if we declare data of a certain size not aligned on the sizeof
boundaries, we won't be aligned on cache-lines neither. (unless I am
utterly wrong..) :)
> Which is something I was going to comment on: Mathieu, you try to align
> the constant itself, but you don't prevent the instruction overall from
> crossing a cache line. Given how delicate all this stuff is, it seems
> like a good idea to do that.
>
We just can't, for movl is 5 bytes in total : 1 byte for opcode, 4
bytes for the immediate value. But since we do not modify the opcode at
all, CPUs will either see the old or new immediate value (each of those
will be coherent because of the atomic update) and, in every case, they
will use it with the same opcode that haven't been touched.
>
> >> * 4 bytes
> >> B8 + rd MOV r32, imm32 (1 byte opcode)
> >> C7 /0 MOV r/m32, imm32 (2 bytes opcode)
> >> (the 2 bytes opcode can be a problem)
> >>
> >>
> >
> > If gas generates the C7 opcodes by default, then that's a bug, nothing less.
> >
>
> Well, in this case, it might be preferred if it brings the constant into
> alignment without explicit padding :)
>
It will need explicit padding too. We would have to align the 4 bytes
immediate value on 4 bytes multiples. Therefore, this 2 bytes opcode
followed by 4 bytes immediate value would have to be aligned on
(4 bytes - 2) boundaries.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/