Re: INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detectedby 0, t=10002 jiffies)

From: John Stultz
Date: Mon Oct 08 2012 - 19:40:16 EST


On 09/30/2012 04:59 AM, Fengguang Wu wrote:
On Sun, Sep 30, 2012 at 01:32:46PM +0200, Avi Kivity wrote:
On 09/30/2012 01:23 PM, Fengguang Wu wrote:
On Sun, Sep 30, 2012 at 01:10:55PM +0200, Avi Kivity wrote:
On 09/28/2012 05:35 AM, Paul E. McKenney wrote:
On Thu, Sep 27, 2012 at 12:40:44PM +0800, Fengguang Wu wrote:
On Wed, Sep 26, 2012 at 09:28:50PM -0700, Paul E. McKenney wrote:
On Thu, Sep 27, 2012 at 10:54:00AM +0800, Fengguang Wu wrote:
On Wed, Sep 26, 2012 at 09:45:43AM -0700, Paul E. McKenney wrote:
On Wed, Sep 26, 2012 at 04:15:01PM +0800, Fengguang Wu wrote:
[ . . . ]

But could you also please send your .config file and a description of
.config attached.

the workload you are running?
It's basically the below commands. The exact initrd is not relevant in
this case because it's a boot time warning before user space is
started. The stalls roughly happen 1 time on every 10 boots.
Yow!!!

You have severe cross-CPU time-synchronization problems. See for
example the first dmesg, with the relevant part extracted right here.
One CPU believes that it is about 37 seconds past boot, and the other
CPU beleives that it is about 137 seconds past boot. Given that large
of a time difference, an RCU CPU stall warning is expected behavior.
Good spot! Yeah I noticed that huge timestamp gap, however didn't take
it seriously enough..

Get your two CPUs in agreement about what time it is, and I bet that
the CPU stall warnings will go away.
Possibly KVM related? Because the warnings show up in many test boxes
running KVM and so is not likely some hardware specific issue.
I vaguely recall seeing something recently. But let's ask the KVM and
timekeeping guys.
>From the logs it looks like hpet (why not kvmclock?) is used for the
clock, it should not generate such drifts since it is a global clock.
Can you verify current_clocksource on a boot that actually failed (in
case the clocksource is switched during runtime)?
I've checked out the dmesg that's cited by Paul, attached. Yes it
contains lines

[ 4.970051] Switching to clocksource hpet

and then

[ 7.250353] Switching to clocksource tsc

And there is no kvm-clock lines. Oh well for this particular kernel:

Ah, tsc will certainly break on kvm if the hardware doesn't provide a
constant tsc source. I'm surprised the guest kernel didn't detect it
and switch back to hpet though.
Thanks, it's good to know the root cause. All the dmesgs show the same hpet+tsc
switching pattern (and never switch back):

$ grep Switching dmesg-kvm_bisect2-inn-*21
dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[ 4.111415] Switching to clocksource hpet
dmesg-kvm_bisect2-inn-41931-2012-09-27-10-37-51-3.6.0-rc7-bisect2-00078-g593d100-21:[ 6.550098] Switching to clocksource tsc

Is this still an open issue? Fengguang's mail sounds like its resolved, but I'm not sure it is.

The switching from HPET -> TSC I believe is expected, as the refined calibration will delay the TSC from being registered for a few seconds. However, its unclear why the TSC, if it is faulty, isn't being caught and demoted by the clocksource watchdog.

I'm also curious why this originally bisected down to 06ae115a1d551cd952d8 (when using the kvm clock) if it was more of a hardware issue. And in those logs, I don't see the printk time-stamp inconsistencies that were alluded to in this thread.

Fengguang: Is this still reproducible? Do you have any details (dmesg) about host system as well?

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/