Re: [resend] Timer broadcast question

From: Santosh Shilimkar
Date: Thu Feb 21 2013 - 04:13:27 EST

On Thursday 21 February 2013 02:31 PM, Daniel Lezcano wrote:
On 02/21/2013 07:19 AM, Santosh Shilimkar wrote:
On Tuesday 19 February 2013 11:51 PM, Daniel Lezcano wrote:
On 02/19/2013 07:10 PM, Thomas Gleixner wrote:
On Tue, 19 Feb 2013, Daniel Lezcano wrote:
I am working on identifying the different wakeup sources from the
interrupts and I have a question regarding the timer broadcast.

The broadcast timer is setup to the next event and that will wake up
idle cpu belonging to the "broadcast cpumask", right ?

The cpu which has been woken up will look for each cpu the next-event
and send an IPI to wake it up.

Although, it is possible the sender of this IPI may not be concerned by
the timer expiration and has been woken up just for sending the IPI,
right ?


If this is correct, is it possible to setup the timer irq affinity to a
cpu which will be concerned by the timer expiration ? so we prevent an
unnecessary wake up for a cpu.

It is possible, but we never implemented it.

If we go there, we want to make that conditional on a property flag,
because some interrupt controllers especially on x86 only allow to
move the affinity from interrupt context, which is pointless.

Thanks Thomas for your quick answer. I will write a RFC patchset.

Last year I implemented the affinity hook for broad-cast code and
experimented with it. Since the system I was using was dual core,
it wasn't much beneficial and hence gave up later. I did remember
discussing the approach with few folks in the conference.

I did a brief test with a similar patch on a ARM u8500 board. The timer
is tied with CPU0 by default, setting the dynamic irq affinity reduce
considerably the number of IPI. The difference with your patch is the
affinity is set to one CPU, the first one which is supposed to be wake
up by the timer expiration.

This is easy to spot with a small program doing usleep wired on CPU1.

We see CPU0 waking up to send an IPI to CPU1 and going to idle again.

I don't know how that behaves with OMAP4 with this patch (which I guess
it is the board you used), but the coupled idle state traces could be
ambiguous if you relied on it to check the benefit of this patch.

Across OMAP4 and OMAP5 based devices, only the general purpose OMAP5
devices the approach was useful. Rest of the devices had constraints
of master CPU(CPU0) waking up first always which in turns means pining
the affinity to that CPU always which the current code already does.
That was also another reason I didn't persue it further.

IMO, it is worth to implement such solution and perhaps we can extend it
to optimize the package idle time with the generic power domain tied
with the irq. Anyway, it is a random thought let's see that later :)

It is surely a good optimization especially for multi-core CPUIdle.

Patch in the end of the email (also attached) for generic broadcast
code. I didn't look at all corner case though. In arch code then
you need to setup "broadcast_affinity" hook which should be able
to get handle of the arch irqchip and call the respective affinity
handler. Just 3 lines function should do the trick.

As Thomas said, effectiveness of such optimization solely depends
on how well the affinity (in low powers) supported by your IRQ chip.

Hope this is helpful for you.

Thanks a lot for your patch and your feedbacks.

Am glad that it was helpful.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at