Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()

From: H. Peter Anvin
Date: Sat Feb 28 2009 - 20:44:50 EST


Arjan van de Ven wrote:
>
> the reason that movntq and co are faster is because you avoid the
> write-allocate behavior of the caches....
>
> the cache polluting part of it I find hard to buy for general use (as
> this discussion shows)... that will be extremely hard to measure as
> a real huge thing, while the WA part is like a 1.5x to 2x thing.
>

Note that hardware *can* (which is not the same thing as hardware
*will*) elide the write-allocate behavior. We did that at Transmeta for
rep movs and certain other instructions which provably filled in entire
cache lines. I haven't investigated if newer Intel CPUs do that in the
"fast rep movs" case.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/