Re: [BUG nohz]: wrong user and system time accounting

From: Wanpeng Li
Date: Tue Apr 11 2017 - 07:43:54 EST


2017-04-11 19:36 GMT+08:00 Peter Zijlstra <peterz@xxxxxxxxxxxxx>:
> On Tue, Apr 11, 2017 at 07:03:17PM +0800, Wanpeng Li wrote:
>> 2017-03-30 21:38 GMT+08:00 Frederic Weisbecker <fweisbec@xxxxxxxxx>:
>> > On Thu, Mar 30, 2017 at 02:47:11PM +0800, Wanpeng Li wrote:
>>
>> [...]
>>
>> >
>> >>
>> >> -------------------------------------->8-----------------------------------------------------
>> >>
>> >> use nanosecond granularity to check deltas but only perform an actual
>> >> cputime update when that delta >= TICK_NSEC.
>> >>
>> >> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
>> >> index f3778e2b..f1ee393 100644
>> >> --- a/kernel/sched/cputime.c
>> >> +++ b/kernel/sched/cputime.c
>> >> @@ -676,18 +676,21 @@ void thread_group_cputime_adjusted(struct
>> >> task_struct *p, u64 *ut, u64 *st)
>> >> #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
>> >> static u64 vtime_delta(struct task_struct *tsk)
>> >> {
>> >> - unsigned long now = READ_ONCE(jiffies);
>> >> + u64 now = local_clock();
>> >
>> > I fear we need a global clock, because the reader (task_cputime()) needs
>> > to compute the delta and therefore use the same clock from any CPU.
>> >
>> > Or we can use the local_clock() but the reader must access the same.
>> >
>> > So there would be vtime_delta_writer() which uses local_clock and stores
>> > the current CPU to tsk->vtime_cpu (under the vtime_seqcount). And then
>> > vtime_delta_reader() which calls sched_clock_cpu(tsk->vtime_cpu) which
>> > is protected by vtime_seqcount as well.
>> >
>> > Although those sched_clock_cpu() things seem to only matter when the
>> > sched_clock() is unstable. And that stability is a condition for nohz_full
>> > to work anyway. So probably sched_clock() alone would be enough.
>>
>> I observed ~60% user time and ~40% sys time when replace local_clock()
>> above by sched_clock()(two cpu hogs on the cpu in nohz_full mode). In
>> addition, Luiz's testcast ./acct-bug 1 995 will show 100% idle time.
>> If keep local_clock() in vtime_delta(), cpu hogs testcase will
>> success. However, Luiz's testcase still show 100% idle time.
>
> Assuming a stable TSC, there should be no difference between
> local_clock() and sched_clock().

So it is weird. I did't see any unstable tsc dump in dmesg.

Regards,
Wanpeng Li