Re: [PATCH] Fast csum_partial_copy_generic and more

From: Artur Skawina (skawina@geocities.com)
Date: Fri May 19 2000 - 15:15:47 EST


kumon@flab.fujitsu.co.jp wrote:
>
> Unfortunately, AS version does not show a significant gain. If the
> cache is hit,it may show some advantage. But unfortunately, in the

quite possible. it seems, assuming your numbers are accurate, i gave
up investigating the prefetching too early. it was pretty obvious
that on a p3 the prefetch instructions would give a speedup, but
i wasn't sure the dummy read overhead would be worth it on p2.

[if anybody wants to play with prefetch, you could start by
 adding two "prefetch" insns to the top of the loop. As these
 should do the right thing, won't generate exceptions and can
 be trivially bypassed for older cpus i'd expect the results
 to be even more spectacular. I don't have a prefetch capable
 cpu to test this on however...)

> Strictly speaking, this prefetch may read just after source regionn at
> most 3 byte. But it never causes trouble, because this excessive area

what you could do is to not use SRC(), but have a dummy exception
handler. (yeah, this would solve Andrea's "buffer overflow" too ;)

I'll play with the patch, try to reproduce your numbers, and see
if merging both patches would be a win.
It won't likely happen until after the weekend however.

artur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:18 EST