Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

From: Yang Shi
Date: Mon Nov 05 2018 - 15:18:16 EST




On 11/5/18 10:35 AM, Linus Torvalds wrote:
On Mon, Nov 5, 2018 at 10:28 AM Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:
Actually, the commit is mainly for optimizing the long stall time caused
by holding mmap_sem by write when unmapping or shrinking large mapping.
It downgrades write mmap_sem to read when zapping pages. So, it looks
the downgrade incurs more context switches. This is kind of expected.

However, the test looks just shrink the mapping with one normal 4K page
size. It sounds the overhead of context switches outpace the gain in
this case at the first glance.
I'm not seeing why there should be a context switch in the first place.

Even if you have lots of concurrent brk() users, they should all block
exactly the same way as before (a write lock blocks against a write
lock, but it *also* blocks against a downgraded read lock).

Yes, it is true. The brk() users will not get waken up. What I can think of for now is there might be other helper processes and/or kernel threads are waiting for read mmap_sem. They might get waken up by the downgrade.

But, I also saw huge increase in cpu idle time and sched_goidle events. Not have clue yet for why idle goes up.

20610709 Â 15% +2376.0% 5.103e+08 Â 34% cpuidle.C1.time
28753819 Â 39% +1054.5% 3.319e+08 Â 49% cpuidle.C3.time

175049 Â 72% +840.7% 1646720 Â 72% sched_debug.cpu.sched_goidle.stddev


Thanks,
Yang


So no, I don't want just some limit to hide this problem for that
particular test. There's something else going on.

Linus