Re: [RFC PATCH 0/4] Reduce worst-case scanning of runqueues in select_idle_sibling

From: Li, Aubrey
Date: Mon Dec 07 2020 - 21:08:42 EST


On 2020/12/7 23:42, Mel Gorman wrote:
> On Mon, Dec 07, 2020 at 04:04:41PM +0100, Vincent Guittot wrote:
>> On Mon, 7 Dec 2020 at 10:15, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> This is a minimal series to reduce the amount of runqueue scanning in
>>> select_idle_sibling in the worst case.
>>>
>>> Patch 1 removes SIS_AVG_CPU because it's unused.
>>>
>>> Patch 2 improves the hit rate of p->recent_used_cpu to reduce the amount
>>> of scanning. It should be relatively uncontroversial
>>>
>>> Patch 3-4 scans the runqueues in a single pass for select_idle_core()
>>> and select_idle_cpu() so runqueues are not scanned twice. It's
>>> a tradeoff because it benefits deep scans but introduces overhead
>>> for shallow scans.
>>>
>>> Even if patch 3-4 is rejected to allow more time for Aubrey's idle cpu mask
>>
>> patch 3 looks fine and doesn't collide with Aubrey's work. But I don't
>> like patch 4 which manipulates different cpumask including
>> load_balance_mask out of LB and I prefer to wait for v6 of Aubrey's
>> patchset which should fix the problem of possibly scanning twice busy
>> cpus in select_idle_core and select_idle_cpu
>>
>
> Seems fair, we can see where we stand after V6 of Aubrey's work. A lot
> of the motivation for patch 4 would go away if we managed to avoid calling
> select_idle_core() unnecessarily. As it stands, we can call it a lot from
> hackbench even though the chance of getting an idle core are minimal.
>

Sorry for the delay, I sent v6 out just now. Comparing to v5, v6 followed Vincent's
suggestion to decouple idle cpumask update from stop_tick signal, that is, the
CPU is set in idle cpumask every time the CPU enters idle, this should address
Peter's concern about the facebook trail-latency workload, as I didn't see
any regression in schbench workload 99.0000th latency report.

However, I also didn't see any significant benefit so far, probably I should
put more load on the system. I'll do more characterization of uperf workload
to see if I can find anything.

Thanks,
-Aubrey