Re: jump/alignment considerations

Ian Lance Taylor (ian@cygnus.com)
Wed, 29 Jan 1997 15:49:53 -0500


From: Mat Hostetter <mat@lcs.mit.edu>
Date: 29 Jan 1997 15:37:01 -0500

>>>>> "mlehmann" == Marc Lehmann <mlehmann@hildesheim.sgh-net.de> writes:

mlehmann> Since lea (%eax),%eax's are used to create 2-7 byte nops,
mlehmann> removing any alignment will significantly reduce AGI's,
mlehmann> too, which is probably why the pentium is not getting
mlehmann> slower (it actually gets a slight advantage).

gas 2.7 generates suboptimal padding bytes, but this could be fixed.
For a two byte NOP gas emits `lea (%esi),%esi', when it could use
`movl %esi,%esi' and avoid the possibility of an AGI stall. For some
larger pads it puts two lea's back to back *both using %esi*, which is
non-pairable and causes a guaranteed AGI stall.

Fortunately, this type of thing is easy to fix for somebody who knows
what the best opcode sequences are. I've duplicated the relevant
function below. If somebody sends me an update, I can ensure that it
gets into the next binutils release.

Unfortunately, gas doesn't currently understand any options for which
particular processor (486, Pentium, etc.) to assemble for, and, even
if it did, gcc doesn't pass in any such options. If there are cases
where it matters, we can probably figure something out.

Ian

void
i386_align_code (fragP, count)
fragS *fragP;
int count;
{
/* Various efficient no-op patterns for aligning code labels. */
static const char f32_1[] = {0x90};
static const char f32_2[] = {0x8d,0x36};
static const char f32_3[] = {0x8d,0x76,0x00};
static const char f32_4[] = {0x8d,0x74,0x26,0x00};
static const char f32_5[] = {0x90,
0x8d,0x74,0x26,0x00};
static const char f32_6[] = {0x8d,0xb6,0x00,0x00,0x00,0x00};
static const char f32_7[] = {0x8d,0xb4,0x26,0x00,0x00,0x00,0x00};
static const char f32_8[] = {0x90,
0x8d,0xb4,0x26,0x00,0x00,0x00,0x00};
static const char f32_9[] = {0x8d,0x36,
0x8d,0xb4,0x26,0x00,0x00,0x00,0x00};
static const char f32_10[] = {0x8d,0x76,0x00,
0x8d,0xb4,0x26,0x00,0x00,0x00,0x00};
static const char f32_11[] = {0x8d,0x74,0x26,0x00,
0x8d,0xb4,0x26,0x00,0x00,0x00,0x00};
static const char f32_12[] = {0x8d,0xb6,0x00,0x00,0x00,0x00,
0x8d,0xb6,0x00,0x00,0x00,0x00};
static const char f32_13[] = {0x8d,0xb6,0x00,0x00,0x00,0x00,
0x8d,0xb4,0x26,0x00,0x00,0x00,0x00};
static const char f32_14[] = {0x8d,0xb4,0x26,0x00,0x00,0x00,0x00,
0x8d,0xb4,0x26,0x00,0x00,0x00,0x00};
static const char f32_15[] = {0xeb,0x0d,0x90,0x90,0x90,0x90,0x90,
0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90};
static const char f16_4[] = {0x8d,0xb6,0x00,0x00};
static const char f16_5[] = {0x90,
0x8d,0xb6,0x00,0x00};
static const char f16_6[] = {0x8d,0x36,
0x8d,0xb6,0x00,0x00};
static const char f16_7[] = {0x8d,0x76,0x00,
0x8d,0xb6,0x00,0x00};
static const char f16_8[] = {0x8d,0xb6,0x00,0x00,
0x8d,0xb6,0x00,0x00};
static const char *const f32_patt[] = {
f32_1, f32_2, f32_3, f32_4, f32_5, f32_6, f32_7, f32_8,
f32_9, f32_10, f32_11, f32_12, f32_13, f32_14, f32_15
};
static const char *const f16_patt[] = {
f32_1, f32_2, f32_3, f16_4, f16_5, f16_6, f16_7, f16_8,
f32_15, f32_15, f32_15, f32_15, f32_15, f32_15, f32_15
};

if (count > 0 && count <= 15)
{
if (flag_16bit_code)
{
memcpy(fragP->fr_literal + fragP->fr_fix,
f16_patt[count - 1], count);
if (count > 8) /* adjust jump offset */
fragP->fr_literal[fragP->fr_fix + 1] = count - 2;
}
else
memcpy(fragP->fr_literal + fragP->fr_fix,
f32_patt[count - 1], count);
fragP->fr_var = count;
}
}