Re: [PATCH][RFC] specific do_timer_cpu value for nohz off mode

From: Jiri Bohac
Date: Tue Mar 19 2013 - 13:03:15 EST


Hi,

following up on a very old thread:
http://thread.gmane.org/gmane.linux.kernel/1212777

On Thu, Feb 16, 2012 at 08:59:00AM -0600, Dimitri Sivanich wrote:
> On Wed, Feb 15, 2012 at 09:36:47PM +0100, Thomas Gleixner wrote:
> > On Wed, 15 Feb 2012, Dimitri Sivanich wrote:
> > > On Wed, Feb 15, 2012 at 03:52:06PM +0100, Thomas Gleixner wrote:
> > > > So the first CPU which registers a clock event device takes it. That's
> > > > the boot CPU, no matter what.
> > > >
> > > Both kernel tracing and the original patch that I proposed for this
> > > showed plainly (at the time) that the tick_do_timer_cpu was not always cpu 0
> > > prior to modifying it for nohz=off. Maybe that is no longer the case?
> >
> > This logic has not been changed in years.
>
> I did some tracing of all points where tick_do_timer_cpu is set in the
> 3.3.0-rc3+ kernel.
>
> >
> > tick_do_timer_cpu is initialized to TICK_DO_TIMER_BOOT and the first
> > cpu which registers either a global or a per cpu clock event device
> > takes it over. This is at least on x86 always the boot cpu, i.e. cpu0.
> > After that point nothing touches that variable when nohz is disabled
> > (runtime or compile time).
>
> At that point it is set to cpu 0. However, when we go into highres mode
> it does change. Below are the two places it was set during boot with
> nohz=off on one of our x86 based machines.
>
> [ 0.000000] tick_setup_device: tick_do_timer_cpu 0
> [ 1.924098] tick_broadcast_setup_oneshot: tick_do_timer_cpu 17
>
> So in this example it's now cpu 17, and it stays that way from that point on.
>
> A traceback at that point shows tick_init_highres is indeed initiating this:
>
> [ 1.924863] [<ffffffff81087e01>] tick_broadcast_setup_oneshot+0x71/0x160
> [ 1.924863] [<ffffffff81087f23>] tick_broadcast_switch_to_oneshot+0x33/0x50
> [ 1.924863] [<ffffffff81088841>] tick_switch_to_oneshot+0x81/0xd0
> [ 1.924863] [<ffffffff810888a0>] tick_init_highres+0x10/0x20
> [ 1.924863] [<ffffffff81061e71>] hrtimer_run_pending+0x71/0xd0
>
> >
> > So I really want to see proper proof why that would not be the
> > case. If it really happens then we fix the root cause instead of
> > adding random sysfs interfaces.

As Dimitri wrote above, the switch from cpu 0 is done by
tick_broadcast_setup_oneshot. The first CPU switching to highres
takes the broadcast responsibility and also sets
tick_do_timer_cpu to itself.

This behaviour has been introduced by 7300711e
(clockevents: broadcast fixup possible waiters).

I don't see a good reason assign tick_do_timer_cpu to the CPU
doing the one-shot timer broadcasts. The timer interrupt will be
generated on any other CPU as well, be it through the broadcast
IPI or a per-CPU clockevent device. Any online CPU can do that
job, so how about just dropping the assignment?

The do_timer() code should not suffer from the jitter introduced
by the interrupt being generated by the broadcast, should it?

Signed-off-by: Jiri Bohac <jbohac@xxxxxxx>

--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -572,9 +572,6 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc)

bc->event_handler = tick_handle_oneshot_broadcast;

- /* Take the do_timer update */
- tick_do_timer_cpu = cpu;
-
/*
* We must be careful here. There might be other CPUs
* waiting for periodic broadcast. We need to set the



--
Jiri Bohac <jbohac@xxxxxxx>
SUSE Labs, SUSE CZ

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/