Re: [PATCH 00/10] sched: EEVDF using latency-nice

From: K Prateek Nayak
Date: Wed Mar 22 2023 - 05:38:35 EST


Hello Peter,

One important detail I forgot to mention: When I picked eevdf commits
from your tree
(https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=sched/core),
they were based on v6.3-rc1 with the sched/eevdf HEAD at:

commit: 0dddbc0b54ad ("sched/fair: Implement an EEVDF like policy")

On 3/22/2023 12:19 PM, K Prateek Nayak wrote:
> Hello Peter,
>
> Leaving some results from my testing on a dual socket Zen3 machine
> (2 x 64C/128T) below.
>
> tl;dr
>
> o I've not tested workloads with nice and latency nice yet focusing more
> on the out of the box performance. No changes to sched_feat were made
> for the same reason.
>
> o Except for hackbench (m:n communication relationship), I do not see any
> regression for other standard benchmarks (mostly 1:1 or 1:n) relation
> when system is below fully loaded.
>
> o At fully loaded scenario, schbench seems to be unhappy. Looking at the
> data from /proc/<pid>/sched for the tasks with schedstats enabled,
> there is an increase in number of context switches and the total wait
> sum. When system is overloaded, things flip and the schbench tail
> latency improves drastically. I suspect the involuntary
> context-switches help workers make progress much sooner after wakeup
> compared to tip thus leading to lower tail latency.
>
> o For the same reason as above, tbench throughput takes a hit with
> number of involuntary context-switches increasing drastically for the
> tbench server. There is also an increase in wait sum noticed.
>
> o Couple of real world workloads were also tested. DeathStarBench
> throughput tanks much more with the updated version in your tree
> compared to this series as is.
> SpecJBB Max-jOPS sees large improvements but comes at a cost of
> drop in Critical-jOPS signifying an increase in either wait time
> or an increase in involuntary context-switches which can lead to
> transactions taking longer to complete.
>
> o Apart from DeathStarBench, the all the trends reported remain same
> comparing the version in your tree and this series, as is, applied
> on the same base kernel.
>
> I'll leave the detailed results below and some limited analysis.
>
> On 3/6/2023 6:55 PM, Peter Zijlstra wrote:
>> Hi!
>>
>> Ever since looking at the latency-nice patches, I've wondered if EEVDF would
>> not make more sense, and I did point Vincent at some older patches I had for
>> that (which is here his augmented rbtree thing comes from).
>>
>> Also, since I really dislike the dual tree, I also figured we could dynamically
>> switch between an augmented tree and not (and while I have code for that,
>> that's not included in this posting because with the current results I don't
>> think we actually need this).
>>
>> Anyway, since I'm somewhat under the weather, I spend last week desperately
>> trying to connect a small cluster of neurons in defiance of the snot overlord
>> and bring back the EEVDF patches from the dark crypts where they'd been
>> gathering cobwebs for the past 13 odd years.
>>
>> By friday they worked well enough, and this morning (because obviously I forgot
>> the weekend is ideal to run benchmarks) I ran a bunch of hackbenck, netperf,
>> tbench and sysbench -- there's a bunch of wins and losses, but nothing that
>> indicates a total fail.
>>
>> ( in fact, some of the schbench results seem to indicate EEVDF schedules a lot
>> more consistent than CFS and has a bunch of latency wins )
>>
>> ( hackbench also doesn't show the augmented tree and generally more expensive
>> pick to be a loss, in fact it shows a slight win here )
>>
>>
>> hackbech load + cyclictest --policy other results:
>>
>>
>> EEVDF CFS
>>
>> # Min Latencies: 00053
>> LNICE(19) # Avg Latencies: 04350
>> # Max Latencies: 76019
>>
>> # Min Latencies: 00052 00053
>> LNICE(0) # Avg Latencies: 00690 00687
>> # Max Latencies: 14145 13913
>>
>> # Min Latencies: 00019
>> LNICE(-19) # Avg Latencies: 00261
>> # Max Latencies: 05642
>>
>
> Following are the results from testing the series on a dual socket
> Zen3 machine (2 x 64C/128T):
>
> NPS Modes are used to logically divide single socket into
> multiple NUMA region.
> Following is the NUMA configuration for each NPS mode on the system:
>
> NPS1: Each socket is a NUMA node.
> Total 2 NUMA nodes in the dual socket machine.
>
> Node 0: 0-63, 128-191
> Node 1: 64-127, 192-255
>
> NPS2: Each socket is further logically divided into 2 NUMA regions.
> Total 4 NUMA nodes exist over 2 socket.
>
> Node 0: 0-31, 128-159
> Node 1: 32-63, 160-191
> Node 2: 64-95, 192-223
> Node 3: 96-127, 223-255
>
> NPS4: Each socket is logically divided into 4 NUMA regions.
> Total 8 NUMA nodes exist over 2 socket.
>
> Node 0: 0-15, 128-143
> Node 1: 16-31, 144-159
> Node 2: 32-47, 160-175
> Node 3: 48-63, 176-191
> Node 4: 64-79, 192-207
> Node 5: 80-95, 208-223
> Node 6: 96-111, 223-231
> Node 7: 112-127, 232-255
>
> Kernel versions:
> - tip: 6.2.0-rc6 tip sched/core
> - eevdf: 6.2.0-rc6 tip sched/core
> + eevdf commits from your tree
> (https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/eevdf)

I had cherry picked the following commits for eevdf:

commit: b84a8f6b6fa3 ("sched: Introduce latency-nice as a per-task attribute")
commit: eea7fc6f13b4 ("sched/core: Propagate parent task's latency requirements to the child task")
commit: a143d2bcef65 ("sched: Allow sched_{get,set}attr to change latency_nice of the task")
commit: d9790468df14 ("sched/fair: Add latency_offset")
commit: 3d4d37acaba4 ("sched/fair: Add sched group latency support")
commit: 707840ffc8fa ("sched/fair: Add avg_vruntime")
commit: 394af9db316b ("sched/fair: Remove START_DEBIT")
commit: 89b2a2ee0e9d ("sched/fair: Add lag based placement")
commit: e3db9631d8ca ("rbtree: Add rb_add_augmented_cached() helper")
commit: 0dddbc0b54ad ("sched/fair: Implement an EEVDF like policy")

from the sched/eevdf branch in your tree onto the tip branch back when
I started testing. I notice some more changes have been added since then.
Queuing testing of latest changes on the updated tip:sched/core based
on v6.3-rc3. I was able to cherry pick the latest commits from
sched/eevdf cleanly.

>
> - eevdf prev: 6.2.0-rc6 tip sched/core + this series as is
>
> When the testing started, the tip was at:
> commit 7c4a5b89a0b5 "sched/rt: pick_next_rt_entity(): check list_entry"
> [..snip..]
>
--
Thanks and Regards,
Prateek