Re: nohz problem with idle time on old hardware

From: Thomas Gleixner
Date: Wed Nov 13 2013 - 09:02:24 EST


On Wed, 13 Nov 2013, Matthew Whitehead wrote:

> I was testing the 3.12 kernel on some _old_ hardware and I uncovered a bug.
> It arises when nohz=on and goes away with nohz=off. On a crusty dual Pentium-1
> system that is completely idle, the sar utility reports 0% idle time on cpu0
> and 100% idle on cpu1. Cpu0 _should_ also be reporting 100% idle, but instead
> it reports around 75% system time and 25% user time.
>
> The problem was diagnosed by Steve Rostedt with help from John
> Stultz. The old system declares the dual TSCs unstable, and backs
> down to a timesource of refined-jiffies. Apparently refined-jiffies
> and jiffies are not a usable timesourcefor nohz, but we don't check
> for that case because most modern systems have several reliable
> hardware timesources.

Wrong.

> John suggested that we turn off nohz unless a usable hardware timesource is
> present.

nohz already depends on two things:

1) A reliable clocksource which is valid for highres/nohz

2) A per cpu clockevent device which supports one shot mode.

and those are evaluated at runtime before we switch into NOHZ mode.

And neither jiffies nor refined-jiffies qualify as valid clocksource.

So there is something else wrong.

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/