Re: [PATCH v3] sched/cputime: add steal time support to full dynticks CPU time accounting

From: Wanpeng Li
Date: Fri Jun 03 2016 - 04:49:47 EST


2016-06-03 15:16 GMT+08:00 Ingo Molnar <mingo@xxxxxxxxxx>:
>
> * Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>
>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>
>> This patch adds steal guest time support to full dynticks CPU
>> time accounting. After 'commit ff9a9b4c4334 ("sched, time: Switch
>> VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")', time is jiffy
>> based sampling even if it's still listened to ring boundaries, so
>> steal_account_process_tick() is reused to account how much 'ticks'
>> are steal time after the last accumulation.
>
> WTF? This changelog has 4 grammar errors and it sails through review just like
> that?
>
> 1) What does 'time is jiffy based sampling' mean?
> 2) what does 'even if it's still listened to ring boundaries' mean?
> 3) "how muck 'ticks'"?
> 4) "are steal time"?
>
> So I fixed this to be at least parseable:
>
> This patch adds guest steal-time support to full dynticks CPU
> time accounting. After the following commit:
>
> ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")
>
> ... time sampling became jiffy based, even if it's still listened
> to ring boundaries, so steal_account_process_tick() is reused
> to account how many 'ticks' are stolen-time, after the last accumulation.
>

Thanks, Ingo, my fault!

> Although I'm still wondering what this key phrase means:
>
> even if it's still listened to ring boundaries,
>
> Could someone please explain what this means? (Beyond the 5th grammar error this
> portion has, which I'll fix once it actually makes sense to me...)

delta time is accounted through context tracking which can probe on
context boundaries such as kernel and userspace(includes syscalls and
exceptions entry/exit) when use vtime.

>
> Furthermore, the real problem that made me go back and tear the changelog apart is
> that the code flow itself is incredibly ugly and fragile as hell:
>
>> write_seqcount_begin(&tsk->vtime_seqcount);
>> tsk->vtime_snap_whence = VTIME_SYS;
>> if (vtime_delta(tsk)) {
>> + cputime_t steal_time;
>> + unsigned long delta_st = steal_account_process_tick();
>> delta_cpu = get_vtime_delta(tsk);
>> + steal_time = jiffies_to_cputime(delta_st);
>> +
>> + if (steal_time >= delta_cpu) {
>> + write_seqcount_end(&tsk->vtime_seqcount);
>> + return;
>> + }
>> + delta_cpu -= steal_time;
>> account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
>> }
>> write_seqcount_end(&tsk->vtime_seqcount);
>> }
>
> Yeah, a return in the middle of a locking critical section, really??
>
> Also, how about basic style details like leaving an extra newline after local
> variable definition sections, like every other scheduler function does?
>
> Also, what's this thing about calling a time unit variable 'delta_cpu'? When I
> reviewed this one of my first reactions was: "Why are we comparing time to CPU
> ID??".
>
> Plus as an added bonus a 'delta_st' variable name to count ticks, which variable
> is not just badly named but single-use. WTF?
>
> Something like this looks much better and shorter:
>
> void vtime_account_user(struct task_struct *tsk)
> {
> cputime_t delta_time, steal_time;
>
> write_seqcount_begin(&tsk->vtime_seqcount);
> tsk->vtime_snap_whence = VTIME_SYS;
> if (vtime_delta(tsk)) {
> delta_time = get_vtime_delta(tsk);
> steal_time = jiffies_to_cputime(steal_account_process_tick());
>
> if (steal_time < delta_time) {
> delta_time -= steal_time;
> account_user_time(tsk, delta_time, cputime_to_scaled(delta_time));
> }
> }
> write_seqcount_end(&tsk->vtime_seqcount);
> }
>
> See the consistent, obvious naming of the variables and the clear code flow?

Yeah, thank you again, Ingo, I just cleanup the whole patch as your suggestion.

---

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5..9ff036b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -257,7 +257,7 @@ void account_idle_time(cputime_t cputime)
cpustat[CPUTIME_IDLE] += (__force u64) cputime;
}

-static __always_inline bool steal_account_process_tick(void)
+static __always_inline unsigned long steal_account_process_tick(void)
{
#ifdef CONFIG_PARAVIRT
if (static_key_false(&paravirt_steal_enabled)) {
@@ -279,7 +279,7 @@ static __always_inline bool steal_account_process_tick(void)
return steal_jiffies;
}
#endif
- return false;
+ return 0;
}

/*
@@ -691,9 +691,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)

static void __vtime_account_system(struct task_struct *tsk)
{
- cputime_t delta_cpu = get_vtime_delta(tsk);
+ cputime_t delta_time = get_vtime_delta(tsk);
+ cputime_t steal_time = jiffies_to_cputime(steal_account_process_tick());

- account_system_time(tsk, irq_count(), delta_cpu,
cputime_to_scaled(delta_cpu));
+ if (steal_time < delta_time) {
+ delta_time -= steal_time;
+ account_system_time(tsk, irq_count(), delta_time,
cputime_to_scaled(delta_time));
+ }
}

void vtime_account_system(struct task_struct *tsk)
@@ -718,13 +722,18 @@ void vtime_gen_account_irq_exit(struct task_struct *tsk)

void vtime_account_user(struct task_struct *tsk)
{
- cputime_t delta_cpu;
+ cputime_t delta_time, steal_time;

write_seqcount_begin(&tsk->vtime_seqcount);
tsk->vtime_snap_whence = VTIME_SYS;
if (vtime_delta(tsk)) {
- delta_cpu = get_vtime_delta(tsk);
- account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+ delta_time = get_vtime_delta(tsk);
+ steal_time = jiffies_to_cputime(steal_account_process_tick());
+
+ if (steal_time < delta_time) {
+ delta_time -= steal_time;
+ account_user_time(tsk, delta_time,
cputime_to_scaled(delta_time));
+ }
}
write_seqcount_end(&tsk->vtime_seqcount);
}