[patch 00/18] CFS Bandwidth Control v7.2

From: Paul Turner
Date: Thu Jul 21 2011 - 19:01:15 EST

Next message: Paul Turner: "[patch 16/18] sched: return unused runtime on group dequeue"
Previous message: Paul Turner: "[patch 04/18] sched: validate CFS quota hierarchies"
Next in thread: Paul Turner: "[patch 02/18] sched: hierarchical task accounting for SCHED_OTHER"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi all,

Please find attached the incremental v7.2 for bandwidth control.

This release follows a fairly intensive period of scraping cycles across
various configurations. Unfortunately we seem to be currently taking an IPC
hit for jump_labels (despite a savings in branches/instr. ret) which despite
fairly extensive digging I don't have a good explanation for. The emitted
assembly /looks/ ok, but cycles/wall time is consistently higher across several
platforms.

As such I've demoted the jumppatch to [RFT] while these details are worked
out. But there's no point in holding up the rest of the series any more.

[ Please find the specific discussion related to the above attached to patch
17/18. ]

So -- without jump labels -- the current performance looks like:

instructions cycles branches
---------------------------------------------------------------------------------------------
clovertown [!BWC] 843695716 965744453 151224759
+unconstrained 845934117 (+0.27) 974222228 (+0.88) 152715407 (+0.99)
+10000000000/1000: 855102086 (+1.35) 978728348 (+1.34) 154495984 (+2.16)
+10000000000/1000000: 853981660 (+1.22) 976344561 (+1.10) 154287243 (+2.03)

barcelona [!BWC] 810514902 761071312 145351489
+unconstrained 820573353 (+1.24) 748178486 (-1.69) 148161233 (+1.93)
+10000000000/1000: 827963132 (+2.15) 757829815 (-0.43) 149611950 (+2.93)
+10000000000/1000000: 827701516 (+2.12) 753575001 (-0.98) 149568284 (+2.90)

westmere [!BWC] 792513879 702882443 143267136
+unconstrained 802533191 (+1.26) 694415157 (-1.20) 146071233 (+1.96)
+10000000000/1000: 809861594 (+2.19) 701781996 (-0.16) 147520953 (+2.97)
+10000000000/1000000: 809752541 (+2.18) 705278419 (+0.34) 147502154 (+2.96)

Under the workload:
mkdir -p /cgroup/cpu/test
echo $$ > /dev/cgroup/cpu/test (only cpu,cpuacct mounted)
(W1) taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches bash -c "for ((i=0;i<5;i++)); do $(dirname $0)/pipe-test 20000; done"

This may seem a strange work-load but it works around some bizarro overheads
currently introduced by perf. Comparing for example with::w
(W2)taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches bash -c "$(dirname $0)/pipe-test 100000;true"
(W3)taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches bash -c "$(dirname $0)/pipe-test 100000;"

We see:
(W1) westmere [!BWC] 792513879 702882443 143267136 0.197246943
(W2) westmere [!BWC] 912241728 772576786 165734252 0.214923134
(W3) westmere [!BWC] 904349725 882084726 162577399 0.748506065

vs an 'ideal' total exec time of (approximately):
$ time taskset -c 0 ./pipe-test 100000
real 0m0.198 user 0m0.007s ys 0m0.095s

The overhead in W2 is explained by that invoking pipe-test directly, one of
the siblings is becoming the perf_ctx parent, invoking lots of pain every time
we switch. I do not have a reasonable explantion as to why (W1) is so much
cheaper than (W2), I stumbled across it by accident when I was trying some
combinations to reduce the <perf stat>-to-<perf stat> variance.

v7.2
-----------
- Build errors in !CGROUP_SCHED case fixed
- !CONFIG_SMP now 'supported' (#ifdef munging)
- gcc was failing to inline account_cfs_rq_runtime, affecting performance
- checks in expire_cfs_rq_runtime() and check_enqueue_throttle() re-organized
to save branches.
- jump labels introduced in the case BWC is not being used system-wide to
reduce inert overhead.
- branch saved in expiring runtime (reorganize conditonals)

Hidetoshi, the following patchsets have changed enough to necessitate tweaking
of your Reviewed-by:
[patch 09/18] sched: add support for unthrottling group entities (extensive)
[patch 11/18] sched: prevent interactions with throttled entities (update_cfs_shares)
[patch 12/18] sched: prevent buddy interactions with throttled entities (new)

Previous postings:
-----------------
v7.1: https://lkml.org/lkml/2011/7/7/24
v7: http://lkml.org/lkml/2011/6/21/43
v6: http://lkml.org/lkml/2011/5/7/37
v5: http://lkml.org/lkml/2011/3 /22/477
v4: http://lkml.org/lkml/2011/2/23/44
v3: http://lkml.org/lkml/2010/10/12/44
v2: http://lkml.org/lkml/2010/4/28/88
Original posting: http://lkml.org/lkml/2010/2/12/393

Prior approaches: http://lkml.org/lkml/2010/1/5/44 ["CFS Hard limits v5"]

Thanks,

- Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Paul Turner: "[patch 16/18] sched: return unused runtime on group dequeue"
Previous message: Paul Turner: "[patch 04/18] sched: validate CFS quota hierarchies"
Next in thread: Paul Turner: "[patch 02/18] sched: hierarchical task accounting for SCHED_OTHER"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]