Re: NO_HZ_IDLE causes consistently low cpu "iowait" time (and higher cpu "idle" time)

From: Alan Jenkins
Date: Wed Jul 03 2019 - 12:09:14 EST


On 03/07/2019 15:06, Doug Smythies wrote:
On 2019.07.01 08:34 Alan Jenkins wrote:

Hi
Hi,

I tried running a simple test:

dd if=testfile iflag=direct bs=1M of=/dev/null

With my default settings, `vmstat 10` shows something like 85% idle time
to 15% iowait time. I have 4 CPUs, so this is much less than one CPU
worth of iowait time.

If I boot with "nohz=off", I see idle time fall to 75% or below, and
iowait rise to about 25%, equivalent to one CPU. That is what I had
originally expected.

(I can also see my expected numbers, if I disable *all* C-states and
force polling using `pm_qos_resume_latency_us` in sysfs).

The numbers above are from a kernel somewhere around v5.2-rc5. I saw
the "wrong" results on some previous kernels as well. I just now
realized the link to NO_HZ_IDLE.[1]

[1]
https://unix.stackexchange.com/questions/517757/my-basic-assumption-about-system-iowait-does-not-hold/527836#527836

I did not find any information about this high level of inaccuracy. Can
anyone explain, is this behaviour expected?
I'm not commenting on expected behaviour or not, just that it is
inconsistent.

I found several patches that mentioned "iowait" and NO_HZ_IDLE. But if
they described this problem, it was not clear to me.

I thought this might also be affecting the "IO pressure" values from the
new "pressure stall information"... but I am too confused already, so I
am only asking about iowait at the moment :-).
Using your workload, I confirm inconsistent behaviour for /proc/stat
(which vmstat uses) between kernels 4.15, 4.16, and 4.17:
4.15 does what you expect, no matter idle states enabled or disabled.
4.16 doesn't do what you expect regardless. (although a little erratic.)
= 4.17 does what you expect with only idle state 0 enabled, and doesn't otherwise.
Actual test data vmstat (/proc/stat) (8 CPUs, 12.5% = 1 CPU)):
Kernel idle/iowait % Idle states >= 1
4.15 88/12 enabled
4.15 88/12 disabled
4.16 99/1 enabled
4.16 99/1 disabled
4.17 98/2 enabled
4.17 88/12 disabled

Note 1: I never booted with "nohz=off" because the tick never turns off for
idle state 0, which is good enough for testing.

Note 2: Myself, I don't use /proc/stat for idle time statistics. I use:
/sys/devices/system/cpu/cpu*/cpuidle/state*/time
And they seem to always be consistent at the higher idle percentage number.

Unless someone has some insight, the next step is kernel bisection,
once for between kernel 4.15 and 4.16, then again between 4.16 and 4.17.
The second bisection might go faster with knowledge gained from the first.
Alan: Can you do kernel bisection? I can only do it starting maybe Friday.

... Doug

Thanks for your help Doug!

I wish I had a faster CPU :-), but I'm familiar with bisection. I have started and I'm down to about 8 minute builds, so I can probably be done before Friday.

Alan