Re: [PATCH] process cputimer is moving faster than itscorresponding clock

From: Olivier Langlois
Date: Wed Apr 10 2013 - 23:29:58 EST


On Wed, 2013-04-10 at 13:35 +0200, Peter Zijlstra wrote:
> On Fri, 2013-04-05 at 13:59 -0400, Olivier Langlois wrote:
> > Process timers are moving fasters than their corresponding
> > cpu clock for various reasons:
> >
> > 1. There is a race condition when getting a timer sample that makes the sample
> > be smaller than it should leading to setting the timer expiration to soon.
> > 2. When initializing the cputimer, by including tasks deltas in the initial
> > timer value, it makes them be counted twice.
> > 3. When a thread autoreap itself when exiting, the last context switch update
> > will update the cputimer and not the overall process values stored in
> > signal.
>
> Please explain these races. Things like task_sched_runtime() on which
> most of this stuff is build read both sum_exec_runtime and compute the
> delta while holding the rq->lock; this should avoid any and all races
> against update_curr() and the sort.
>
In my previous reply, I have explained in length the race condition but
I didn't realize that you were also mentioning my refactoring of
task_sched_runtime() so I comment a little bit more about this proposal:

currently:

- cputimer is initialized with the result of thread_group_cputime()
which is (accounted time + tasks deltas)
- cputimer sample value is then cputimer + 1 more task_delta_exec()
- After all active tasks pass through update_curr(), cputimer is
(accounted time + 2*(tasks deltas))

By being able to get separately get accounted time and delta, you can:

- Initialize cputimer to accounted time
- thread group cputimer sample will be cputimer + delta (which is
essentially equivalent to what would thread_group_cputime() return)
- After all the deltas are in by having called
account_group_exec_runtime(), cputimer will be set to (accounted time +
tasks delta) and have the exact same value of the corresponding process
clock.

In other words, currently the way the cputimer is initialized contribute
to make it advance faster than its corressponding process clock.

This part of the patch has nothing to do with race condition, as far as
I can tell, thread_group_cputime() and task_delta_exec() are rock solid.
It is just that you need delta and accounted time separately and
preferably atomically to be able to initialize posix cpu timer
correctly.

Greetings,
Olivier



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/