[RFC] Documentation for io-accounting / reporting via procfs

From: devzero
Date: Sat Mar 03 2007 - 09:19:34 EST


Hi !

I have tried to compile some docs for the new and very useful io-accounting feature, mostly by grabbing the existing information from lkml/sources.

Feel free to comment, modify, ignore - whatever.

If you like it, maybe this can be merged into Documentation/filesystems/proc.txt later !?

regards
Roland Kletzing
Sysadmin


--------------------------------------------------------------------------------

/proc/$PID/io - Show the IO accounting fields.

Example
-------

test:/tmp # dd if=/dev/zero of=/tmp/test.dat &
[1] 3828

test:/tmp # cat /proc/3828/io
rchar: 323934931
wchar: 323929600
syscr: 632687
syscw: 632675
read_bytes: 0
write_bytes: 323932160
cancelled_write_bytes: 0


Description
-----------

rchar: (unsigned long long)

The number of bytes which this task has caused to be read from storage.
This is simply the sum of bytes which this process passed to read() and
pread(). It includes things like tty IO and it is unaffected by whether
or not actual physical disk IO was required (the read might have been
satisfied from pagecache)


wchar: (unsigned long long)

The number of bytes which this task has caused, or shall cause to be written
to disk. Similar caveats apply here as with rchar.


syscr: (unsigned long long)

I/O counter: read syscalls
Attempt to count the number of read I/O operations, i.e. syscalls like read()
and pread().


syscw: (unsigned long long)

I/O counter: write syscalls
Attempt to count the number of write I/O operations, i.e. syscalls like write()
and pwrite().


read_bytes: (unsigned long long)

I/O counter: bytes read
Attempt to count the number of bytes which this process really did cause to
be fetched from the storage layer. Done at the submit_bio() level, so it is
accurate for block-backed filesystems. <please add status regarding NFS and CIFS
at a later time>


write_bytes: (unsigned long long)

I/O counter: bytes written
Attempt to count the number of bytes which this process caused to be sent to
the storage layer. This is done at page-dirtying time.


cancelled_write_bytes: (unsigned long long)

The big inaccuracy here is truncate. If a process writes 1MB to a file and
then deletes the file, it will in fact perform no writeout. But it will have
been accounted as having caused 1MB of write.
In other words: The number of bytes which this process caused to not happen,
by truncating pagecache. A task can cause "negative" IO too. If this task
truncates some dirty pagecache, some IO which another task has been accounted
for (in its write_bytes) will not be happening. We _could_ just subtract that
from the truncating task's write_bytes, but there is information loss in doing
that.


Note:

At it`s current implementation state, it's a bit racy on 32-bit machines: if process
A reads process B's /proc/pid/io while process B is updating one of those 64-bit
counters, process A could see an intermediate result.



More information about this can be found within taskstats documentation at
Documentation/accounting

--------------------------------------------------------------------------------




akpm@xxxxxxxx
From: Andrew Morton <akpm@xxxxxxxx>

Add a simple /proc/pid/io to show the IO accounting fields.

Maybe this shouldn't be merged in mainline - the preferred reporting channel
is taskstats. But given the poor state of our userspace support for
taskstats, this is useful for developer-testing, at least. And it improves
the changes that the procps developers will wire it up into top(1). Opinions
are sought.

The patch also wires up the existing IO-accounting fields.

It's a bit racy on 32-bit machines: if process A reads process B's
/proc/pid/io while process B is updating one of those 64-bit counters, process
A could see an intermediate result.

Cc: Jay Lan <jlan@xxxxxxx>
Cc: Shailabh Nagar <nagar@xxxxxxxxxxxxxx>
Cc: Balbir Singh <balbir@xxxxxxxxxx>
Cc: Chris Sturtivant <csturtiv@xxxxxxx>
Cc: Tony Ernst <tee@xxxxxxx>
Cc: Guillaume Thouvenin <guillaume.thouvenin@xxxxxxxx>
Cc: David Wright <daw@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
---

fs/proc/base.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff -puN fs/proc/base.c~io-accounting-report-in-procfs fs/proc/base.c
--- a/fs/proc/base.c~io-accounting-report-in-procfs
+++ a/fs/proc/base.c
@@ -1804,6 +1804,27 @@ static int proc_base_fill_cache(struct f
proc_base_instantiate, task, p);
}

+#ifdef CONFIG_TASK_IO_ACCOUNTING
+static int proc_pid_io_accounting(struct task_struct *task, char *buffer)
+{
+ return sprintf(buffer,
+ "rchar: %llu\n"
+ "wchar: %llu\n"
+ "syscr: %llu\n"
+ "syscw: %llu\n"
+ "read_bytes: %llu\n"
+ "write_bytes: %llu\n"
+ "cancelled_write_bytes: %llu\n",
+ (unsigned long long)task->rchar,
+ (unsigned long long)task->wchar,
+ (unsigned long long)task->syscr,
+ (unsigned long long)task->syscw,
+ (unsigned long long)task->ioac.read_bytes,
+ (unsigned long long)task->ioac.write_bytes,
+ (unsigned long long)task->ioac.cancelled_write_bytes);
+}
+#endif
+
/*
* Thread groups
*/
@@ -1855,6 +1876,9 @@ static struct pid_entry tgid_base_stuff[
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject),
#endif
+#ifdef CONFIG_TASK_IO_ACCOUNTING
+ INF("io", S_IRUGO, pid_io_accounting),
+#endif
};

static int proc_tgid_base_readdir(struct file * filp,
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
_______________________________________________________________________
Viren-Scan für Ihren PC! Jetzt für jeden. Sofort, online und kostenlos.
Gleich testen! http://www.pc-sicherheit.web.de/freescan/?mc=022222

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/