Re: [PATCH v3 0/5] Scan for an idle sibling in a single pass

From: Li, Aubrey
Date: Tue Jan 26 2021 - 00:54:37 EST


On 2021/1/25 17:04, Mel Gorman wrote:
> On Mon, Jan 25, 2021 at 12:29:47PM +0800, Li, Aubrey wrote:
>>>>> hackbench -l 2560 -g 1 on 8 cores arm64
>>>>> v5.11-rc4 : 1.355 (+/- 7.96)
>>>>> + sis improvement : 1.923 (+/- 25%)
>>>>> + the patch below : 1.332 (+/- 4.95)
>>>>>
>>>>> hackbench -l 2560 -g 256 on 8 cores arm64
>>>>> v5.11-rc4 : 2.116 (+/- 4.62%)
>>>>> + sis improvement : 2.216 (+/- 3.84%)
>>>>> + the patch below : 2.113 (+/- 3.01%)
>>>>>
>>
>> 4 benchmarks reported out during weekend, with patch 3 on a x86 4s system
>> with 24 cores per socket and 2 HT per core, total 192 CPUs.
>>
>> It looks like mid-load has notable changes on my side:
>> - netperf 50% num of threads in TCP mode has 27.25% improved
>> - tbench 50% num of threads has 9.52% regression
>>
>
> It's interesting that patch 3 would make any difference on x64 given that
> it's SMT2. The scan depth should have been similar. It's somewhat expected
> that it will not be a universal win, particularly once the utilisation
> is high enough to spill over in sched domains (25%, 50%, 75% utilisation
> being interesting on 4-socket systems). In such cases, double scanning can
> still show improvements for workloads that idle rapidly like tbench and
> hackbench even though it's expensive. The extra scanning gives more time
> for a CPU to go idle enough to be selected which can improve throughput
> but at the cost of wake-up latency,

aha, sorry for the confusion. Since you and Vincent discussed to drop
patch3, I just mentioned I tested 5 patches with patch3, not patch3 alone.

>
> Hopefully v4 can be tested as well which is now just a single scan.
>

Sure, may I know the baseline of v4?

Thanks,
-Aubrey