Re: [PATCH] aes-i586-asm.S optimization

From: James Morris
Date: Thu Aug 12 2004 - 14:13:20 EST


On Thu, 12 Aug 2004, Denis Vlasenko wrote:

> > However, 5% is too high. My hack eliminated a register move,
> > which is very fast operation. It shouldn't have such noticeable
> > effect. P4 is a strange beast.
>
> Ok, here is two more patches. opt3.patch does optimisation (similar
> to one in opt.patch) to decrypt routine. Done on top of opt.patch.

This one decreases performance.

> opt4.patch is an experimental one which does register spills
> into stack not by mov's, but by push/pop. May be faster because
> push/pop pairs often get special treatment in hardware in todays
> processors, because it is such a typical operation.
> Also, each mov is four bytes of code, whereas push or pop
> is only one. Together with eliminated stack adjustment code
> this shortens object code by 348 bytes.

This one decreases performance again.

I'm using apachebench over loopback with IPSec to test this, so perhaps
not the best 'raw' benchmark, but I think a reasonable macro benchmark.


- James
--
James Morris
<jmorris@xxxxxxxxxx>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/