Re: Syslets, Threadlets, generic AIO support, v6

From: Zach Brown
Date: Tue May 29 2007 - 18:51:19 EST


> .. so don't keep us in suspense. Do you have any numbers for anything
> (like Oracle, to pick a random thing out of thin air ;) that might
> actually indicate whether this actually works or not?

I haven't gotten to running Oracle's database against it. It is going
to be Very Cranky if O_DIRECT writes aren't concurrent, and that's going
to take a bit of work in fs/direct-io.c.

I've done initial micro-benchmarking runs for basic sanity testing with
fio. They haven't wildly regressed, that's about as much as can be said
with confidence so far :).

Take a streaming O_DIRECT read. 1meg requests, 64 in flight.

str: (g=0): rw=read, bs=1M-1M/1M-1M, ioengine=libaio, iodepth=64

mainline:

read : io=3,405MiB, bw=97,996KiB/s, iops=93, runt= 36434msec

aio+syslets:

read : io=3,452MiB, bw=99,115KiB/s, iops=94, runt= 36520msec

That's on an old gigabit copper FC array with 10 drives behind a, no
seriously, qla2100.

The real test is the change in memory and cpu consumption, and I haven't
modified fio to take reasonably precise measurements of those yet. Once
I get O_DIRECT writes concurrent that'll be the next step.

I was pleased to see my motivation for the patches, to avoid having to
add specific support for operations to be called from fs/aio.c, work
out.

Take the case of 4k random buffered reads from a block device with a
cold cache:

read: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64

mainine:

read : io=16,116KiB, bw=457KiB/s, iops=111, runt= 36047msec
slat (msec): min= 4, max= 629, avg=563.17, stdev=71.92
clat (msec): min= 0, max= 0, avg= 0.00, stdev= 0.00

aio+syslets:

read : io=125MiB, bw=3,634KiB/s, iops=887, runt= 36147msec
slat (msec): min= 0, max= 3, avg= 0.00, stdev= 0.08
clat (msec): min= 2, max= 643, avg=71.59, stdev=74.25

aio+syslets w/o cfq

read : io=208MiB, bw=6,057KiB/s, iops=1,478, runt= 36071msec
slat (msec): min= 0, max= 15, avg= 0.00, stdev= 0.09
clat (msec): min= 2, max= 758, avg=42.75, stdev=37.33

Everyone step back and thank Jens for writing a tool that gives us
interesting data without us always having to craft some stupid specific
test each and every time. Thanks, Jens!

In the mainline number fio clearly shows the buffered read submissions
being handled synchronously. The mainline buffered IO paths doesn't
know to identify and work with iocbs so requests are handled in series.

In the +syslet number we see the __async_schedule() catching
the blocking buffered read, letting the submission proceed
asynchronously. We get async behaviour without having to touch any of
the buffered IO paths.

Then we turn off cfq and we actually start to saturate the (relatively
ancient) drives :).

I need to mail Jens about that cfq behaviour, but I'm guessing it's
expected behaviour of a sort -- each syslet thread gets its own
io_context instead of inheriting it from its parent.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/