Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

From: Benjamin Gilbert
Date: Sun Jun 10 2007 - 12:48:50 EST

Matt Mackall wrote:
On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote:
It's not just the loop unrolling; it's the register allocation and spilling. For comparison, I built SHATransform() from the drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty close to what you tested back then. The resulting code is 49% MOV instructions, and 80% of *those* involve memory. gcc4 is somewhat better, but it still spills a whole lot, both for the 2.6.11 unrolled code and for the current lib/sha1.c.

Wait, your benchmark is comparing against the unrolled code?

No, it's comparing the current lib/sha1.c to the optimized code in the patch. I was just pointing out that the unrolled code you were likely testing against, back then, may not have been very good. (Though I assumed that you were talking about the unrolled code in random.c, not the code in CryptoAPI, so that might change the numbers some. It appears from the post you linked below that the unrolled CryptoAPI code still beat the rolled version?)

How big is the -code- footprint?

About 3700 bytes for the 32-bit version of sha_transform().

Whoa. We've regressed something horrible here:

In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s.
Were your tests with or without the latest /dev/urandom fixes? This
one in particular:;a=commitdiff;h=374f167dfb97c1785515a0c41e32a66b414859a8

I'm not in front of that machine right now; I can check tomorrow. For what it's worth, I've seen equivalent performance (a few MB/s) on a range of fairly-recent kernels.

--Benjamin Gilbert
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at