Re: [beta patch] SSE copy_page() / clear_page()

From: Andrew Morton (andrewm@uow.edu.au)
Date: Fri Feb 16 2001 - 10:27:02 EST


Manfred Spraul wrote:
>
> Intel Pentium III and P 4 have hardcoded "fast stringcopy" operations
> that invalidate whole cachelines during write (documented in the most
> obvious place: multiprocessor management, memory ordering)

Which are dramatically slower than a simple `mov' loop for just
about all alignments, except for source and dest both eight-byte
aligned.

For example, copying an unchached source to an uncached dest,
with the source misaligned, my PIII Coppermine does 108 MBytes/sec
with `rep;movsl' and 149 MBytes/sec with an open-coded variant
of our copy_csum routines. That's a lot. Similar results
on a PII and a PIII Katmai.

On the K6-2, however, the string operation is almost always
a win.

It seems that a good approximation for our bulk-copy strategy is:

        if (AMD) {
                string_copy();
        } else if (intel) {
                if ((source|dest) & 7)
                        duff_copy();
                else
                        string_copy();
        } else {
                quack();
        }

This will make our Intel copies 20-40% faster than
at present, depending upon the distribution of
alignments. (And for networking, the distribution
is pretty much uniform).

Somewhere on my to-do list is getting lots of people to
test lots of architectures with lots of combinations of
[source/dest][cached/uncached] at lots of alignments
to confirm if this will work.

If you have time, could you please grab

        http://www.uow.edu.au/~andrewm/linux/cptimer.tar.gz

and teach it how to do SSE copies, in preparation for this
great event?

Thanks.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 23 2001 - 21:00:13 EST