Re: Csum and csum copyroutines benchmark

From: Denis Vlasenko (vda@port.imtp.ilyichevsk.odessa.ua)
Date: Fri Oct 25 2002 - 11:00:37 EST


On 25 October 2002 08:19, Alan Cox wrote:
> On Fri, 2002-10-25 at 14:59, Denis Vlasenko wrote:
> > Well, that makes it run entirely in L0 cache. This is unrealistic
> > for actual use. movntq is x3 faster when you hit RAM instead of L0.
> >
> > You need to be more clever than that - generate pseudo-random
> > offsets in large buffer and run on ~1K pieces of that buffer.
>
> In a lot of cases its extremely realistic to assume the network
> buffers are in cache. The copy/csum path is often touching just
> generated data, or data we just accessed via read(). The csum RX path
> from a card with DMA is probably somewhat different.

'Touching' is not interesting since it will pump data
into cache, no matter how you 'touch' it.

Running benchmarks against 1K static buffer makes cache red hot
and causes _all writes_ to hit it. It may lead to wrong conclusions.

Is _dst_ buffer of csum_copy going to be used by processor soon?
If yes, we shouldn't use movntq, we want to cache dst.
If no, we should by all means use movntq.
If sometimes, then optimal strategy does not exist. :(

--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:27 EST