Re: [PATCH] sched/fair: Clear target from cpus to scan in select_idle_cpu

From: Yicong Yang
Date: Thu Nov 25 2021 - 07:48:56 EST


On 2021/11/25 19:17, Mel Gorman wrote:
> On Wed, Nov 24, 2021 at 04:54:01PM +0800, Yicong Yang wrote:
>> Commit 56498cfb045d noticed that "When select_idle_cpu starts scanning for
>> an idle CPU, it starts with a target CPU that has already been checked
>> by select_idle_sibling. This patch starts with the next CPU instead."
>> It only changed the scanning start cpu to target + 1 but still leave
>> the target in the scanning cpumask. The target still have a chance to be
>> checked in the last turn. Fix this by clear the target from the cpus
>> to scan.
>>
>> Fixes: 56498cfb045d ("sched/fair: Avoid a second scan of target in select_idle_cpu")
>> Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
>
> Did you check the performance of this? When I tried something like this
> in a different context, I found that the cost of clearing the bit was
> more expensive than simply using target + 1. For the target to be
> rescanned, the whole mask would have to be scanned as no other CPUs are
> idle which is the unlikely case. By clearing the bit, a cost is always
> incurred even if the first CPU scanned is idle.
>

Not yet, it's from code. I've launched some tests and we'll see the results tomorrow.

We traced the scanning here and seems the case that scan the whole LLC without
finding an idle cpu has some proportion. On 4-NUMA 128-Core Kunpeng 920 server
tested with mysql, there is ~1% probability for not finding and idle cpu when
sysbench threads is 128. The probability will increase when the load increases.