mlehmann> Since lea (%eax),%eax's are used to create 2-7 byte nops,
mlehmann> removing any alignment will significantly reduce AGI's,
mlehmann> too, which is probably why the pentium is not getting
mlehmann> slower (it actually gets a slight advantage).
gas 2.7 generates suboptimal padding bytes, but this could be fixed.
For a two byte NOP gas emits `lea (%esi),%esi', when it could use
`movl %esi,%esi' and avoid the possibility of an AGI stall. For some
larger pads it puts two lea's back to back *both using %esi*, which is
non-pairable and causes a guaranteed AGI stall.
-Mat