Re: [patch 2/3] pvclock: detect watchdog reset at pvclock read

From: Marcelo Tosatti
Date: Wed Oct 09 2013 - 17:27:03 EST


On Wed, Oct 09, 2013 at 09:55:19AM -0400, Don Zickus wrote:
> On Tue, Oct 08, 2013 at 07:08:11PM -0300, Marcelo Tosatti wrote:
> > On Tue, Oct 08, 2013 at 09:37:05AM -0400, Don Zickus wrote:
> > > On Mon, Oct 07, 2013 at 10:05:17PM -0300, Marcelo Tosatti wrote:
> > > > Implement reset of kernel watchdogs at pvclock read time. This avoids
> > > > adding special code to every watchdog.
> > > >
> > > > This is possible for watchdogs which measure time based on sched_clock() or
> > > > ktime_get() variants.
> > > >
> > > > Suggested by Don Zickus.
> > > >
> > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> > >
> > > Awesome. Thanks for figuring this out Marcelo. Does that mean we can
> > > revert commit 5d1c0f4a now? :-)
> >
> > Unfortunately no: soft lockup watchdog does not measure time based on
> > sched_clock but on hrtimer interrupt count :-(
>
> I believe it does. See __touch_watchdog() which calls get_timestamp() -->
> local_clock(). That is how it calculates the duration of the softlockup.
>
> Now with your patch, it just sets the timestamp to zero with
> touch_softlockup_watchdog_sync(), which is fine. It will just sync up the
> clock, set a new timestamp, and check again in the next hrtimer interrupt.
>
> So I guess I am confused what that commit does compared to this patch.
>
> > (see the the softlockup code in question, perhaps you can point to
> > something that i'm missing).
> >
> > BTW, are you OK with printing additional steal time information?
> > https://lkml.org/lkml/2013/6/27/755
>
> Well, I thought this patch was supposed to replace that patch? Why do you
> still need that patch?

>From https://lkml.org/lkml/2013/7/3/675:

"Agree. However, can't see how there is a way around "having custom
kvm/paravirt splat all over", for watchdogs that do:

1. check for watchdog resets
2. read time via sched_clock or xtime.
3. based on 2, decide whether there has been a longer delay than
acceptable.

This is the case for the softlockup timer interrupt. So the splat there
is necessary (otherwise any potential notification of vm-pause event
noticed at 2 might be missed because its checked at 1).

For watchdogs that measure time based on interrupt event (such as hung
task, rcu_cpu_stall, checking for the notification at sched_clock or
lower is fine)."

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/