Re: nohz problem with idle time on old hardware
From: Thomas Gleixner
Date: Wed Nov 13 2013 - 09:02:24 EST
On Wed, 13 Nov 2013, Matthew Whitehead wrote:
> I was testing the 3.12 kernel on some _old_ hardware and I uncovered a bug.
> It arises when nohz=on and goes away with nohz=off. On a crusty dual Pentium-1
> system that is completely idle, the sar utility reports 0% idle time on cpu0
> and 100% idle on cpu1. Cpu0 _should_ also be reporting 100% idle, but instead
> it reports around 75% system time and 25% user time.
>
> The problem was diagnosed by Steve Rostedt with help from John
> Stultz. The old system declares the dual TSCs unstable, and backs
> down to a timesource of refined-jiffies. Apparently refined-jiffies
> and jiffies are not a usable timesourcefor nohz, but we don't check
> for that case because most modern systems have several reliable
> hardware timesources.
Wrong.
> John suggested that we turn off nohz unless a usable hardware timesource is
> present.
nohz already depends on two things:
1) A reliable clocksource which is valid for highres/nohz
2) A per cpu clockevent device which supports one shot mode.
and those are evaluated at runtime before we switch into NOHZ mode.
And neither jiffies nor refined-jiffies qualify as valid clocksource.
So there is something else wrong.
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/