I've been tracking an odd bug that may involve the RCU NOHZ code and[snip]
just want to know if you have any ideas on debugging and/or what might be
wrong. Note the bug happens on *BOTH* upstream and the current RHEL6 tree.
The data in this email is from running on RHEL6 because that's what I happen
to be running ATM. The result, however, is _identical_ to that of linux.git
latest.
The attached program compares userspace TSC reads to the time returned from
the REALTIME_CLOCK[1]. The test does the following
read tsc1
get REALTIME_CLOCK value
read tsc2
and then does a comparison between the tsc read and the REALTIME_CLOCK value
to see if they are in sync with each other.
[I'm leaving out the guts of the analysis here. It is sufficient to show
examples of "good" data and "bad" data IMO.]
On a good run, we see little variance in between the values:
0 144 0.1
1 138 1.8
2 147 -2.9
29 144 -0.6[snip]
n: 30, slope: 0.50 (1.99 GHz), dev: 1.1 ns, max: 2.9 ns
On a bad run, there is a lot of variance between the values:
0 144 -346.0
1 138 1410.8
2 138 -806.9
3 141 4006.6
4 147 -3996.1
29 141 -50.3
n: 30, slope: 0.50 (1.99 GHz), dev: 1231.4 ns, max: 4006.6 ns
It was noted by the bug reporter that specifying "nohz=off" resolved the
problem. I tested with "nohz=off" and AFAICT it fixes the issue. I started
out debugging by assuming that delays in the c-state transitions were not being
properly accounted for in the timing calculations.
I ran a baseline test on an unmodified kernel (with no extra boot options) and
confirmed that powertop shows the CPUs entering deep c-states while the test was
running for 300 runs.
I then instrumented the PM QoS and the power management code (specifically
cpuidle). I put in a large # of printk's to monitor the CPU transitions, and
monitored the power states via powertop in order to verify that the system was
behaving correctly wrt PM QoS.
If you modify the tstsc script to run 300 times with this modified kernel, and
run powertop in the middle of the script, you will see that the processors do
NOT enter deep c-states. **This means that PM QoS is doing its job correctly**.