Re: [v3 0/9] parallelized "struct page" zeroing

From: Matthew Wilcox
Date: Wed May 10 2017 - 13:17:12 EST


On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote:
> From: Michal Hocko <mhocko@xxxxxxxxxx>
> Date: Wed, 10 May 2017 16:57:26 +0200
>
> > Have you measured that? I do not think it would be super hard to
> > measure. I would be quite surprised if this added much if anything at
> > all as the whole struct page should be in the cache line already. We do
> > set reference count and other struct members. Almost nobody should be
> > looking at our page at this time and stealing the cache line. On the
> > other hand a large memcpy will basically wipe everything away from the
> > cpu cache. Or am I missing something?
>
> I guess it might be clearer if you understand what the block
> initializing stores do on sparc64. There are no memory accesses at
> all.
>
> The cpu just zeros out the cache line, that's it.
>
> No L3 cache line is allocated. So this "wipe everything" behavior
> will not happen in the L3.

There's either something wrong with your explanation or my reading
skills :-)

"There are no memory accesses"
"No L3 cache line is allocated"

You can have one or the other ... either the CPU sends a cacheline-sized
write of zeroes to memory without allocating an L3 cache line (maybe
using the store buffer?), or the CPU allocates an L3 cache line and sets
its contents to zeroes, probably putting it in the last way of the set
so it's the first thing to be evicted if not touched.

Or there's some magic in the memory bus protocol where the CPU gets to
tell the DRAM "hey, clear these cache lines". Although that's also a
memory access of sorts ...