Re: [PATCH 0/7] introduce cpu.headroom knob to cpu controller
From: Song Liu
Date:  Mon Apr 22 2019 - 19:22:52 EST
Hi Vincent,
> On Apr 17, 2019, at 5:56 AM, Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
> 
> On Wed, 10 Apr 2019 at 21:43, Song Liu <songliubraving@xxxxxx> wrote:
>> 
>> Hi Morten,
>> 
>>> On Apr 10, 2019, at 4:59 AM, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:
>>> 
> 
>>> 
>>> The bit that isn't clear to me, is _why_ adding idle cycles helps your
>>> workload. I'm not convinced that adding headroom gives any latency
>>> improvements beyond watering down the impact of your side jobs. AFAIK,
>> 
>> We think the latency improvements actually come from watering down the
>> impact of side jobs. It is not just statistically improving average
>> latency numbers, but also reduces resource contention caused by the side
>> workload. I don't know whether it is from reducing contention of ALUs,
>> memory bandwidth, CPU caches, or something else, but we saw reduced
>> latencies when headroom is used.
>> 
>>> the throttling mechanism effectively removes the throttled tasks from
>>> the schedule according to a specific duty cycle. When the side job is
>>> not throttled the main workload is experiencing the same latency issues
>>> as before, but by dynamically tuning the side job throttling you can
>>> achieve a better average latency. Am I missing something?
>>> 
>>> Have you looked at your distribution of main job latency and tried to
>>> compare with when throttling is active/not active?
>> 
>> cfs_bandwidth adjusts allowed runtime for each task_group each period
>> (configurable, 100ms by default). cpu.headroom logic applies gentle
>> throttling, so that the side workload gets some runtime in every period.
>> Therefore, if we look at time window equal to or bigger than 100ms, we
>> don't really see "throttling active time" vs. "throttling inactive time".
>> 
>>> 
>>> I'm wondering if the headroom solution is really the right solution for
>>> your use-case or if what you are really after is something which is
>>> lower priority than just setting the weight to 1. Something that
>> 
>> The experiments show that, cpu.weight does proper work for priority: the
>> main workload gets priority to use the CPU; while the side workload only
>> fill the idle CPU. However, this is not sufficient, as the side workload
>> creates big enough contention to impact the main workload.
>> 
>>> (nearly) always gets pre-empted by your main job (SCHED_BATCH and
>>> SCHED_IDLE might not be enough). If your main job consist
>>> of lots of relatively short wake-ups things like the min_granularity
>>> could have significant latency impact.
>> 
>> cpu.headroom gives benefits in addition to optimizations in pre-empt
>> side. By maintaining some idle time, fewer pre-empt actions are
>> necessary, thus the main workload will get better latency.
> 
> I agree with Morten's proposal, SCHED_IDLE should help your latency
> problem because side job will be directly preempted unlike normal cfs
> task even lowest priority.
> In addition to min_granularity, sched_period also has an impact on the
> time that a task has to wait before preempting the running task. Also,
> some sched_feature like GENTLE_FAIR_SLEEPERS can also impact the
> latency of a task.
> 
> It would be nice to know if the latency problem comes from contention
> on cache resources or if it's mainly because you main load waits
> before running on a CPU
> 
> Regards,
> Vincent
Thanks for these suggestions. Here are some more tests to show the impact 
of scheduler knobs and cpu.headroom.
side-load | cpu.headroom | side/cpu.weight | min_gran | cpu-idle | main/latency
--------------------------------------------------------------------------------
  none    |      0       |     n/a         |    1 ms  |  45.20%  |   1.00
 ffmpeg   |      0       |      1          |   10 ms  |   3.38%  |   1.46
 ffmpeg   |      0       |   SCHED_IDLE    |    1 ms  |   5.69%  |   1.42
 ffmpeg   |    20%       |   SCHED_IDLE    |    1 ms  |  19.00%  |   1.13
 ffmpeg   |    30%       |   SCHED_IDLE    |    1 ms  |  27.60%  |   1.08
In all these cases, the main workload is loaded with same level of 
traffic (request per second). Main workload latency numbers are normalized 
based on the baseline (first row). 
For the baseline, the main workload runs without any side workload, the 
system has about 45.20% idle CPU. 
The next two rows compare the impact of scheduling knobs cpu.weight and 
sched_min_granularity. With cpu.weight of 1 and min_granularity of 10ms, 
we see a latency of 1.46; with SCHED_IDLE and min_granularity of 1ms, we 
see a latency of 1.42. So SCHED_IDLE and min_granularity help protecting 
the main workload. However, it is not sufficient, as the latency overhead 
is high (>40%). 
The last two rows show the benefit of cpu.headroom. With 20% headroom, 
the latency is 1.13; while with 30% headroom, the latency is 1.08. 
We can also see a clear correlation between latency and global idle CPU: 
more idle CPU yields better lower latency. 
Over all, these results show that cpu.headroom provides effective 
mechanism to control the latency impact of side workloads. Other knobs 
could also help the latency, but they are not as effective and flexible 
as cpu.headroom. 
Does this analysis address your concern? 
Thanks,
Song
> 
>> 
>> Thanks,
>> Song
>> 
>>> 
>>> Morten
>>