Re: [PATCH 1/8] drivers/random: Cache align ip_random better

From: George Spelvin
Date: Wed Mar 16 2011 - 14:10:33 EST


> I'm intrigued: please educate me. On what architectures does cache-
> aligning a 48-byte buffer (previously offset by 4 bytes) speed up
> copying from it, and why? Does the copying involve 8-byte or 16-byte
> instructions that benefit from that alignment, rather than cacheline
> alignment?

I had two thoughts in my head when I wrote that:
1) A smart compiler could note the alignment and issue wider copy
instructions. (Especially on alignment-required architectures.)
2) The cacheline fetch would get more data faster. The data would
be transferred in the first 6 beats of the load from RAM (assuming a
64-bit data bus) rather than waiting for 7, so you'd finish the copy
1 ns sooner or so. Similar 1-cycle win on a 128-bit Ln->L(n-1) cache
transfer.

As I said, "infinitesimal". The main reason that I bothered to
generate a patch was that it appealed to my sense of neatness to
keep the 3x16-byte buffer 16-byte aligned.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/