Re: Speed of memcpy, csum_partial and csum_partial_copy

Robert L Krawitz (rlk@tiac.net)
Sat, 8 Jun 1996 11:08:41 -0400


Date: Fri, 7 Jun 96 18:19 BST
From: Jamie Lokier <jamie@rebellion.co.uk>

Someone from Intel told me that the fastest way to copy memory on a
Pentium is to preload about a page's worth of data into the cache, by
touching every 32nd byte. Then proceed with the fastest copy loop you
can, to saturate the write buffers. Alternate as necessary for large
copies. This is supposed to be faster because you avoid most of the
DRAM page misses when turning around from read to write, and vice versa.
The example timings for page misses that he quoted would seem to bear
this out as worthwhile. More so than using 64-bit writes, on a fast
Pentium. (He said the chipset would merge 32-bit writes once they got
out of the CPU anyway).

At least on my system (Neptune chipset), this is not the case. 64-bit
writes are faster than preloading the cache. I suspect you have as
many page misses with this method, anyway -- at some point, the cache
has to start kicking out lines.

However, preloading the cache will be faster if the amount of data to
be copied is small, AND if the copy will be used shortly afterward.
In the Linux kernel, the amount of data to be copied is large (pages),
and it will not be used immediately after

-- 
Robert Krawitz <rlk@tiac.net>           http://www.tiac.net/users/rlk/

Member of the League for Programming Freedom -- mail lpf@uunet.uu.net Tall Clubs International -- tci-request@aptinc.com or 1-800-521-2512