Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()

From: David Miller
Date: Sat Feb 28 2009 - 19:07:25 EST


From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Sat, 28 Feb 2009 09:42:18 -0800 (PST)

> On Sat, 28 Feb 2009, Arjan van de Ven wrote:
> >
> > it invalidates all caches in the hierarchy
>
> Yeah, now that I look at the intel pdf's, I see that.
>
> > afaik this is what Intel cpus do; but I also thought this behavior was
> > quite architectural as well...
>
> Ok, I really think we should definitely not use non-temporal stores for
> anything smaller than one full page in that case. In fact, I wonder if
> even any of the old streaming benchmarks are even true. I thought it would
> still stay in the L3, but yes, it literally seems to make the access
> totally noncached and WC.
>
> That's almost unacceptable in the long run. With a 8MB L3 cache - and a
> compile sequence, do we really want to go out to memory to write the .S
> file, and then have the assembler go out to memory to read it back? For a
> compile, that _probably_ is all fine (the compiler in particular will have
> enough data structures around that it's not going to fit in the cache
> anyway), but I'm seeing leaner compilers and other cases where forcing
> things out all the way on the bus is simply the wrong thing.

I think this is an accurate analysis as well, it's really unfortunate
the non-temporal stuff on x86 doesn't preserve existing cache lines
when present.

I thought that was the whole point. Don't pollute the caches, but
if cache lines are already loaded there, use them and don't purge!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/