Re: too many timer retries happen when do local timer swtich withbroadcast timer

From: Lorenzo Pieralisi
Date: Fri Feb 22 2013 - 05:28:46 EST


On Thu, Feb 21, 2013 at 10:19:18PM +0000, Thomas Gleixner wrote:
> On Thu, 21 Feb 2013, Santosh Shilimkar wrote:
> > On Thursday 21 February 2013 07:18 PM, Thomas Gleixner wrote:
> > > find below a completely untested patch, which should address that issue.
> > >
> > After looking at the thread, I tried to see the issue on OMAP and could
> > see the same issue as Jason.
>
> That's interesting. We have the same issue on x86 since 2007 and
> nobody noticed ever. It's basically the same problem there, but it
> seems that on x86 getting out of those low power states is way slower
> than the minimal reprogramming delta which is used to enforce the
> local timer to fire after the wakeup.
>
> I'm still amazed that as Jason stated a 1us reprogramming delta is
> sufficient to get this ping-pong going. I somehow doubt that, but
> maybe ARM is really that fast :)

It also depends on when the idle driver exits broadcast mode.
Certainly if that's the last thing it does before enabling IRQs, that
might help trigger the issue.

I am still a bit sceptic myself too, and I take advantage of Thomas'
knowledge on the subject, which is ways deeper than mine BTW, to ask a
question. The thread started with a subject "too many retries...." and
here I have a doubt. If the fix is not applied, on the CPU affine to
the broadcast timer, it is _normal_ to have local timer retries, since
the CPU is setting/forcing the local timer to fire after a min_delta_ns every
time the expired event was local to the CPU affine to the broadcast timer.

The problem, supposedly, is that the timer has not enough time (sorry for the
mouthful) to expire(fire) before IRQs are disabled and the idle thread goes
back to idle again. This means that we should notice a mismatch between
the number of broadcast timer IRQs and local timer IRQs on the CPU
affine to the broadcast timer IRQ (granted, we also have to take into
account broadcast timer IRQs meant to service (through IPI) a local timer
expired on a CPU which is not the one running the broadcast IRQ handler and
"normal" local timer IRQs as well).

I am not sure the sheer number of retries is a symptom of the problem
happening, but I might well be mistaken so I am asking.

For certain, with the fix applied, lots of duplicate IRQs on the CPU
affine to the broadcast timer are eliminated, since the local timer is
not reprogrammed anymore (before the fix, basically the broadcast timer
was firing an IRQ that did nothing since the CPU was already out of
broadcast mode by the time the broadcast handler was running, the real job
was carried out in the local timer handler).

>
> > Your patch fixes the retries on both CPUs on my dual core machine. So
> > you use my tested by if you need one.
>
> They are always welcome.
>
> > Tested-by: Santosh Shilimkar <santosh.shilimkar@xxxxxx>

You can add mine too, we should fix the WARN_ON_ONCE mentioned in Santosh's
reply.

Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/