Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression

From: Andi Kleen
Date: Wed Sep 01 2021 - 22:23:43 EST



On 9/1/2021 6:35 PM, Feng Tang wrote:
On Wed, Sep 01, 2021 at 08:12:24AM -0700, Andi Kleen wrote:
Feng Tang <feng.tang@xxxxxxxxx> writes:
Yes, the tests I did is no matter where the 128B padding is added, the
performance can be restored and even improved.
I wonder if we can find some cold, rarely accessed, data to put into the
padding to not waste it. Perhaps some name strings? Or the destroy
support, which doesn't sound like its commonly used.
Yes, I tried to move 'destroy_work', 'destroy_rwork' and 'parent' over
before the 'refcnt' together with some padding, it restored the performance
to about 10~15% regression. (debug patch pasted below)

But I'm not sure if we should use it, before we can fully explain the
regression.

Narrowing it down to a single prefetcher seems good enough to me. The behavior of the prefetchers is fairly complicated and hard to predict, so I doubt you'll ever get a 100% step by step explanation.


-Andi