Re: [PATCH 08/10] posix-cpu-timers: Migrate to use new tick dependency mask model

From: Frederic Weisbecker
Date: Mon Aug 03 2015 - 14:01:21 EST


On Mon, Aug 03, 2015 at 11:59:07AM -0400, Chris Metcalf wrote:
> On 07/31/2015 10:49 AM, Frederic Weisbecker wrote:
> >Instead of doing a per signal dependency, I'm going to use a per task
> >one. Which means that if a per-process timer is enqueued, every thread
> >of that process will have the tick dependency. But if the timer is
> >enqueued to a single thread, only the thread is concerned.
> >
> >We'll see if offloading becomes really needed. It's not quite free because
> >the housekeepers will have to poll on all nohz CPUs at a Hz frequency.
>
> Seems reasonable for now!
>
> Why would we need the Hz frequency polling, though? I would
> think it should be possible to just arrange it such that the timer
> for posix cpu timers would just always be placed either on the core
> that requested it, or if that core is nohz_full, on a housekeeping
> core. Then it would eventually fire from the housekeeping core,
> and the logic could be such that (for a process-wide timer) it
> would preferentially interrupt threads from that process that
> were running on the housekeeping cores. No polling.

But you need to periodically poll on timer expiration from a housekeeper.
It's not only about firing the timer, it's about elapsing it against the
target cputime.

Since there is no tick on a nohz full CPU to account the time spent by
the task, you must do that elsewhere. And if you don't poll in a sufficient
frequency, the time accounted is less precise (a quick round-trip to kernel space
can be missed if the polling frequency is too low). Or you can combine it
with the VIRT_CPU_ACCOUNTING_GEN that we are using currently which records the
time spent in user and kernel space using hooks. Still you must check periodically
that the timer hasn't expired at a frequency that doesn't go further the
expiration time. Easy in the case of a timer attached to a single task but what
about a timer attached to a process? You must poll at least at expiration/nr_threads,
so you must handle thread creation as well.

Offlining posix timers sounds like a big headache if we don't poll at Hz time.

That said Rick has posted patches that offline cputime accounting. I'm not yet sure
this patchset is a good idea but offlining posix timers can be done on top of that.

Another thing: now I recall why I turned posix timers to a global tick dependency.
In case of a per task/process dependency we still need the context switch hook because
if we enqueue a timer to a sleeping task, the tick must be restarted when the task wakes
up. And that requires a check on context switch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/