Re: [arjanv@redhat.com: Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline]

From: Marcelo Tosatti
Date: Mon Sep 27 2004 - 10:14:47 EST


On Thu, Sep 23, 2004 at 05:48:18PM -0700, Nakajima, Jun wrote:
> >From: Marcelo Tosatti [mailto:marcelo.tosatti@xxxxxxxxxxxx]
> >Sent: Thursday, September 23, 2004 3:32 PM
> >To: Nakajima, Jun
> >Cc: linux-kernel@xxxxxxxxxxxxxxx; akpm@xxxxxxxx; arjanv@xxxxxxxxxx;
> >ak@xxxxxxx; Saxena, Sunil; Mallick, Asit K
> >Subject: Re: [arjanv@xxxxxxxxxx: Re: [PATCH] shrink per_cpu_pages to
> fit
> >32byte cacheline]
> >
>
> <snip>
>
> >> >***********
> >> >
> >> >Jun,
> >> >
> >> >We need some assistance here - you can probably help us.
> >> >
> >> >Within the Linux kernel we can benefit from changing some fields
> >> >of commonly accessed data structures to 16 bit instead of 32 bits,
> >> >given that the values for these fields never reach 2 ^ 16.
> >> >
> >> >Arjan warned me, however, that the prefix (in this case "data16")
> will
> >> >cause an additional extra cycle in instruction decoding, per message
> >> above.
> >>
> >> On the Pentium4 core, this is not a big deal because it runs out of
> the
> >> trace cache (i.e. decoded in advance). However, on the Pentium III/M
> >> (aka P6) core (i.e. Penitum III, Banias, Dothan, Yonah, etc.),
> >> especially when an operand size prefix (0x66) changes the # of bytes
> in
> >> an instruction (usually by impacting the size of an immediate in the
> >> instruction), the P6 core pays unnegligible penalty, slowing down
> >> decoding.
> >
> >Jun,
> >
> >What you mean by "unnegligible penalty" ?
> >
> >You mean its very small penalty (unconsiderable), or its considerable
> >penalty?
>
> I mean it's considerable. Did you look at what kinds of instructions are
> used for accessing such data structures? Does the operand size prefix
> change the # of bytes in those instructions (as described above) for
> most cases? If it does, we don't recommend such codes.

Yep, it does change the size the operand size.

Its mostly moving from the memory position into register for
comparison, and moving back to the memory position.

The hottest path (free_hot_cold_page) changes from


105d: 8b 42 08 mov 0x8(%edx),%eax
1086: ff 83 dc 00 00 00 incl 0xdc(%ebx)

10a6: 8b 42 0c mov 0xc(%edx),%eax


to

1087: 0f b7 83 dc 00 00 00 movzwl 0xdc(%ebx),%eax
108e: 40 inc %eax
108f: 66 89 83 dc 00 00 00 mov %ax,0xdc(%ebx)

10c4: 0f b7 93 dc 00 00 00 movzwl 0xdc(%ebx),%edx



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/