[PATCH] Fast csum_partial_copy_generic and more

From: kumon@flab.fujitsu.co.jp
Date: Fri May 19 2000 - 22:50:38 EST


kumon@flab.fujitsu.co.jp writes:
> Strictly speaking, this prefetch may read just after source regionn at
> most 3 byte. But it never causes trouble, because this excessive area
> and the last transfered byte reside in a same cache block.

Sorry I mistook the relation beteen explanation and the patch version.
The above comments is based on long-word prefetching, but what I
actually posted is a byte prefetching version.

In the posted version, the above comments is not useless.

I had measured both version, the performance difference between
long-word prefetching and byte prefetching is almost un-notisable
level.

According to the intel documents, byte access to a part of long-word
register may cause partial-register stall, and I think better to use
movl instead of movb. IMHO, this is applied to my case.

Anyway, the following is architecturally better, ethically worse..

+ SRC(movl -32(%edx),%ebx) ; SRC(movl (%edx),%ebx)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:18 EST