Re: [Fwd: Re: [PATCH v2 2/2] x86: add prefetching to do_csum]

From: Neil Horman
Date: Wed Nov 13 2013 - 07:30:33 EST


On Wed, Nov 13, 2013 at 10:09:51AM -0000, David Laight wrote:
> > Sure, I modified the code so that we only prefetched 2 cache lines ahead, but
> > only if the overall length of the input buffer is more than 2 cache lines.
> > Below are the results (all counts are the average of 1000000 iterations of the
> > csum operation, as previous tests were, I just omitted that column).
>
> Hmmm.... averaging over 100000 iterations means that all the code
> is in the i-cache and the branch predictor will be correctly primed.
>
> For short checksum requests I'd guess that the relevant data
> has just been written and is already in the cpu cache (unless
> there has been a process and cpu switch).
> So prefetch is likely to be unnecessary.
>
> If you assume that the checksum code isn't in the i-cache then
> small requests are likely to be dominated by the code size.
>
I'm not sure, whats the typical capacity for the branch predictors ability to
remember code paths? I ask because the most likely use of do_csum will be in
the receive path of the networking stack (specifically in the softirq handler).
So if we run do_csum once, we're likely to run it many more times, as we clean
out an adapters receive queue.

Neil

> David
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/