Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+
From: Benjamin Gilbert
Date: Sun Jun 10 2007 - 12:48:50 EST
Matt Mackall wrote:
On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote:
It's not just the loop unrolling; it's the register allocation and
spilling. For comparison, I built SHATransform() from the
drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and
SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty
close to what you tested back then. The resulting code is 49% MOV
instructions, and 80% of *those* involve memory. gcc4 is somewhat
better, but it still spills a whole lot, both for the 2.6.11 unrolled
code and for the current lib/sha1.c.
Wait, your benchmark is comparing against the unrolled code?
No, it's comparing the current lib/sha1.c to the optimized code in the
patch. I was just pointing out that the unrolled code you were likely
testing against, back then, may not have been very good. (Though I
assumed that you were talking about the unrolled code in random.c, not
the code in CryptoAPI, so that might change the numbers some. It
appears from the post you linked below that the unrolled CryptoAPI code
still beat the rolled version?)
How big is the -code- footprint?
About 3700 bytes for the 32-bit version of sha_transform().
Whoa. We've regressed something horrible here:
In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s.
Were your tests with or without the latest /dev/urandom fixes? This
one in particular:
I'm not in front of that machine right now; I can check tomorrow. For
what it's worth, I've seen equivalent performance (a few MB/s) on a
range of fairly-recent kernels.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/