cache alignment [ was Re: pre-2.1.132-2.. ]

Chuck Lever (chuckl@netscape.com)
Sun, 20 Dec 1998 22:17:11 -0500


i've been meaning to ask this since i saw the thread on pgcc
optimization options...

arch/i386/Makefile sets these options for most variants of x86:

-malign-jumps=2 -malign-functions=2 -malign-loops=2

1. "2" is the compiler's default for all of these options.
why are these specified at all?
2. why are these separately specifiable? under what circumstances
would you want to set all three differently?

i've been wondering what is the appropriate value for these, based
on populating the CPU cache most efficiently. i dimly recalled
that the Pentium family had a 32-byte cache line, so i hopefully set
these values to 5 (2 to the 5th is 32 bytes). performance dipped
(measured experientially and with lmbench).

i set the values to 4 and recompiled again. performance seems
about the same as before, although i haven't run lmbench again
to confirm the difference.

btw andrea -- where did you find the cache specs ??

> From: Andrea Arcangeli <andrea@e-mind.com>
> Date: Fri, 18 Dec 1998 17:05:56 +0100 (CET)
> Subject: Re: pre-2.1.132-2..
>
> On Fri, 18 Dec 1998, Linus Torvalds wrote:
>
> >
> >
> >On Fri, 18 Dec 1998, MOLNAR Ingo wrote:
> >>
> >> NO! :) On SMP, with shared data structures, we want to have 1 structure
> >> per cacheline, otherwise we'd just play cacheline ping-pong for no good
> >> reason.
> >
> >Note that we probably don't have a cache-line per CPU anyway, because
> >there is nothing to force proper alignment of the irq_desc[] array.
>
> Infact, I just noticed and fixed it myself once I understood the SMP cache
> line issue....
>
> >> (the padding is wrong in both cases btw, unless i cant count (damn, where
> >> was that calculator). We need unused[5])
> >
> >Strange counting.
>
> I guess he said we need_ed_ unused[5]. Now we need [4].
>
> >Anyway, currently the alignment is wrong, and the size is wrong. The size
> >is wrong because the original structure looked different, and one of the
> >changes must have messed that up. Oh, well.
>
> ;)
>
> >I suspect that we could actually get rid of the padding altogether, and
> >align it to a 16-byte boundary. Then it wouldn't be cache-line aligned,
>
> Intel specs say that it must have the first 5 bits zeroed to be cache
> aligned.
>
> >but it _would_ be half-cache-line aligned and thus at least partly
> >minimize the overhead.
>
> Since we just waste 64*4 byte in the struct, we could waste 16 more bytes
> to be completly aligned I think...
>
> Andrea Arcangeli
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/