Re: Interesting scheduling times - NOT

Jukka Tapani Santala (e75644@UWasa.Fi)
Wed, 23 Sep 1998 01:55:57 +0300 (EET DST)


On Tue, 22 Sep 1998, Kurt Garloff wrote:
> w/32 procs per proc
> proc thread proc thread proc thread
> 2.1.120 6.5 2.8 28.3 22.0 0.68 0.60
> 2.1.122 FPU 6.0 3.9 28.1 22.1 0.69 0.57
> 2.1.122 both 4.7 2.5 16.4 11.2 0.37 0.27

I'm surprised... It's my recollection that unaligned data is far slower
than cache misses. I guess accessing byte-aligned bytes isn't that bad,
though. Still I'd be very interested to see statistics on different
computers, and (if the structures aren't specific to one architechture -
can't check just now. If they are, ignore this;) most importantly
architechtures. Which is the unfortunate point in optimizations like
this; they're kinda architechture-dependent.

But if you're going to optimize for special cases, see the "Optimization
Manuals" on Intel's website - they give good insight into the cache- and
burst-loading sequences on Intel architechtures. I would, also, try to
profile with int's instead of char's to see if it's possible to find an
even faster combination between cache-line use and misalignment costs.
But then, I don't have the references in question handy to say if that's
supposed to have any effect, either ;)

-Donwulff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/