Re: [Regression][Revert request] Excessive delay or hang duringresume from system suspend due to a hrtimer commit

From: Thomas Gleixner
Date: Mon Jul 16 2012 - 07:26:43 EST


On Mon, 16 Jul 2012, Thomas Gleixner wrote:

> On Mon, 16 Jul 2012, Rafael J. Wysocki wrote:
>
> > On Monday, July 16, 2012, Thomas Gleixner wrote:
> > > On Sun, 15 Jul 2012, Rafael J. Wysocki wrote:
> > > > To everyone involved: the fact that this change, which was likely to introduce
> > > > regressions from the look of it alone, has been pushed to Linus (an to -stable
> > > > at the same time!) so late in the cycle, is seriuosly disappointing.
> > >
> > > Well, we spent an massive amount of time in testing, reviewing and
> > > discussion and it definitely did not break suspend/resume here.
> >
> > I'm not saying that you didn't consider it thoroughly, but unfortunately you
> > did overlook this particular issue, didn't you?
> >
> > > This was not pushed without a lot of thoughts and in fact what you are
> > > seing is another long standing bug in the timekeeping resume code,
> > > which was just papered over by the incorrect handling of the clock was
> > > set cases in the other parts of the system.
> > >
> > > Does the following patch fix the problem for you ?
> >
> > Yes, it does, thanks!
> >
> > > @John: Should that clear ntp as well or is it enough to set ntp_error
> > > to 0 ?
> > >
> > > /me really goes on vacation now.
> >
> > So who's going to take care of the patch? :-)
>
> I'm still packing gear. So i'll push it into timers/urgent.

Actually that's a bad idea. John want's to double check vs. the
ntp_clear question. So John can send it to linus directly.

@John: Should it be: timekeeping_update(true)

Now I'm gone for real.

Thanks,

tglx
-----
Subject: timekeeping: Add missing update call in timekeeping_resume()
From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date: Mon, 16 Jul 2012 11:47:31 +0200 (CEST)

The leap second rework unearthed another issue of inconsistent data.

On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.

This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for
quite some time.

Add the missing update call, so all the data is consistent everywhere.

Reported-by: Andreas Schwab <schwab@xxxxxxxxxxxxxx>
Reported-by-and-tested-by: "Rafael J. Wysocki" <rjw@xxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Linux PM list <linux-pm@xxxxxxxxxxxxxxx>
Cc: John Stultz <johnstul@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>,
Cc: Prarit Bhargava <prarit@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

Index: tip/kernel/time/timekeeping.c
===================================================================
--- tip.orig/kernel/time/timekeeping.c
+++ tip/kernel/time/timekeeping.c
@@ -717,6 +717,7 @@ static void timekeeping_resume(void)
timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
timekeeper.ntp_error = 0;
timekeeping_suspended = 0;
+ timekeeping_update(false);
write_sequnlock_irqrestore(&timekeeper.lock, flags);

touch_softlockup_watchdog();

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/