Re: [v3 0/9] parallelized "struct page" zeroing

From: David Miller
Date: Wed May 10 2017 - 14:00:35 EST


From: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Date: Wed, 10 May 2017 10:17:03 -0700

> On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote:
>> From: Michal Hocko <mhocko@xxxxxxxxxx>
>> Date: Wed, 10 May 2017 16:57:26 +0200
>>
>> > Have you measured that? I do not think it would be super hard to
>> > measure. I would be quite surprised if this added much if anything at
>> > all as the whole struct page should be in the cache line already. We do
>> > set reference count and other struct members. Almost nobody should be
>> > looking at our page at this time and stealing the cache line. On the
>> > other hand a large memcpy will basically wipe everything away from the
>> > cpu cache. Or am I missing something?
>>
>> I guess it might be clearer if you understand what the block
>> initializing stores do on sparc64. There are no memory accesses at
>> all.
>>
>> The cpu just zeros out the cache line, that's it.
>>
>> No L3 cache line is allocated. So this "wipe everything" behavior
>> will not happen in the L3.
>
> There's either something wrong with your explanation or my reading
> skills :-)
>
> "There are no memory accesses"
> "No L3 cache line is allocated"
>
> You can have one or the other ... either the CPU sends a cacheline-sized
> write of zeroes to memory without allocating an L3 cache line (maybe
> using the store buffer?), or the CPU allocates an L3 cache line and sets
> its contents to zeroes, probably putting it in the last way of the set
> so it's the first thing to be evicted if not touched.

There is no conflict in what I said.

Only an L2 cache line is allocated and cleared. L3 is left alone.