Re: too many timer retries happen when do local timer swtich withbroadcast timer

From: Lorenzo Pieralisi
Date: Mon Feb 25 2013 - 08:34:14 EST


On Fri, Feb 22, 2013 at 06:52:14PM +0000, Thomas Gleixner wrote:
> On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote:
> > On Fri, Feb 22, 2013 at 03:03:02PM +0000, Thomas Gleixner wrote:
> > > On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote:
> > > > On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote:
> > > > > Now we could make use of that and avoid going deep idle just to come
> > > > > back right away via the IPI. Unfortunately the notification thingy has
> > > > > no return value, but we can fix that.
> > > > >
> > > > > To confirm that theory, could you please try the hack below and add
> > > > > some instrumentation (trace_printk)?
> > > >
> > > > Applied, and it looks like that's exactly why the warning triggers, at least
> > > > on the platform I am testing on which is a dual-cluster ARM testchip.
> > > >
> > > > There is a still time window though where the CPU (the IPI target) can get
> > > > back to idle (tick_broadcast_pending still not set) before the CPU target of
> > > > the broadcast has a chance to run tick_handle_oneshot_broadcast (and set
> > > > tick_broadcast_pending), or am I missing something ?
> > >
> > > Well, the tick_broadcast_pending bit is uninteresting if the
> > > force_broadcast bit is set. Because if that bit is set we know for
> > > sure, that we got woken with the cpu which gets the broadcast timer
> > > and raced back to idle before the broadcast handler managed to
> > > send the IPI.
> >
> > Gah, my bad sorry, I mixed things up. I thought
> >
> > tick_check_broadcast_pending()
> >
> > was checking against the tick_broadcast_pending mask not
> >
> > tick_force_broadcast_mask
>
> Yep, that's a misnomer. I just wanted to make sure that my theory is
> correct. I need to think about the real solution some more.
>
> We have two alternatives:
>
> 1) Make the clockevents_notify function have a return value.
>
> 2) Add something like the hack I gave you with a proper name.
>
> The latter has the beauty, that we just need to modify the platform
> independent idle code instead of going down to every callsite of the
> clockevents_notify thing.

Thank you.

I am not sure (1) would buy us anything compared to (2) and as you said we
would end up patching all callsites so (2) is preferred.

As I mentioned, we can even just apply your fixes and leave platform specific
code deal with this optimization, at the end of the day idle driver has
just to check pending IRQs/wake-up sources (which would cover all IRQs not
just TIMER IPI) if and when it has to start time consuming operations like
cache cleaning to enter deep idle. If it goes into a shallow C-state so be it.

On x86 I think it is HW/FW that prevents C-state entering if IRQs are pending,
and on ARM it is likely to happen too, so I am just saying you should not
bother if you think the code becomes too hairy to justify this change.

Thank you very much for the fixes and your help,
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/