[PATCH] Documentation: proc.txt: emphasize that iowait cannot be relied on

From: Alan Jenkins
Date: Fri Jul 12 2019 - 04:26:36 EST


CPU "iowait" time in /proc/stat does not work on my laptop.

I saw the documentation mention several problems with "iowait". However
each problem appeared to be qualified. It gave me the impression I could
probably account for each problem. My impression was wrong.

There are a couple of writeups explaining the specific problem I had.[1][2]

[1] "[RFC PATCH 0/8] rework iowait accounting", 2014-06-26:
https://lore.kernel.org/lkml/53ABE28F.6010402@xxxxxxxxxxxxxx/

[2] A recent writeup by myself:
https://unix.stackexchange.com/questions/517757/my-basic-assumption-about-system-iowait-does-not-hold/527836#527836

This might just be me. Partly, my small knowledge about the scheduler
allowed for false assumptions. But I think we can emphasize more strongly
how broken iowait is on SMP. Overall, I aim to make it sound much scarier
to analyze iowait. I add some precise details, and also some
anxiety-inducing vagueness :-).

[Detailed reasons for the specific points I included:]

1. Let us say that "iowait _can_ be massively under-accounted". It is
likely to remain true in future. At least since v4.16, the
under-accounting problem seems very exposed on non-virtual, multi-CPU
systems. In theory the wheel might turn again; this exposure might be
reduced in future. But even on v4.15, I can reproduce the problem using
CPU affinity.

2. Point to NO_HZ_IDLE, as a good hint towards i) the nature of the problem
and ii) and how widespread it is. To give a more comprehensive picture,
also point to NO_HZ_FULL and VIRT_CPU_ACCOUNTING_NATIVE.

Setting down my exact scenario would require a lot of specifics. That
would be going beyond the point. We could link to one of the writeups as
well, but I don't think we need to.

3. My own "use case" did not expose the problem when I ran it on a virtual
machine. Even using my CPU affinity method.[2] I haven't tracked down
why. This is a significant qualification to point 1. Explicitly
acknowledge this. It's a pain, but it makes the main point easier to
verify, and hence more credible.

(I suspect this is common at least to small test VMs. It appears true
for both a Fedora 30 VM (5.1.x) and a Debian 9 VM (4.9.x). I also tried
some different storage options, virtio-blk v.s. virtio-scsi v.s. isilogic.)

[:end of details]

Signed-off-by: Alan Jenkins <alan.christopher.jenkins@xxxxxxxxx>
---
Documentation/filesystems/proc.txt | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 66cad5c86171..f1da71cd276e 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1348,16 +1348,23 @@ second). The meanings of the columns are as follows, from left to right:
- nice: niced processes executing in user mode
- system: processes executing in kernel mode
- idle: twiddling thumbs
-- iowait: In a word, iowait stands for waiting for I/O to complete. But there
- are several problems:
- 1. Cpu will not wait for I/O to complete, iowait is the time that a task is
- waiting for I/O to complete. When cpu goes into idle state for
- outstanding task io, another task will be scheduled on this CPU.
- 2. In a multi-core CPU, the task waiting for I/O to complete is not running
- on any CPU, so the iowait of each CPU is difficult to calculate.
- 3. The value of iowait field in /proc/stat will decrease in certain
+- iowait: In a word, iowait stands for waiting for I/O to complete. This
+ number is not reliable. The problems include:
+ 1. A CPU does not wait for I/O to complete; iowait is the time that a task
+ is waiting for I/O to complete. When a CPU goes into idle state for
+ outstanding task I/O, another task will be scheduled on this CPU.
+ 2. iowait was extended to support systems with multiple CPUs. But the
+ extended version is misleading. Consider a two-CPU system, where you see
+ 50% iowait. This could represent two tasks that could use 100% of both
+ CPUs, if they were not waiting for I/O.
+ 3. iowait can be massively under-accounted on modern kernels. The iowait
+ code does not account for the behaviour of NO_HZ_IDLE, NO_HZ_FULL, or
+ VIRT_CPU_ACCOUNTING_NATIVE on multi-CPU systems. The amount of
+ under-accounting varies depending on the exact system configuration and
+ kernel version. The effects might be less obvious when running in a
+ virtual machine.
+ 4. The value of iowait field in /proc/stat will decrease in certain
conditions.
- So, the iowait is not reliable by reading from /proc/stat.
- irq: servicing interrupts
- softirq: servicing softirqs
- steal: involuntary wait
--
2.21.0