Re: A quick fio test (was Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3)

From: Suparna Bhattacharya
Date: Mon Feb 26 2007 - 23:28:08 EST


On Mon, Feb 26, 2007 at 03:45:48PM +0100, Jens Axboe wrote:
> On Mon, Feb 26 2007, Suparna Bhattacharya wrote:
> > On Mon, Feb 26, 2007 at 02:57:36PM +0100, Jens Axboe wrote:
> > >
> > > Some more results, using a larger number of processes and io depths. A
> > > repeat of the tests from friday, with added depth 20000 for syslet and
> > > libaio:
> > >
> > > Engine Depth Processes Bw (MiB/sec)
> > > ----------------------------------------------------
> > > libaio 1 1 602
> > > syslet 1 1 759
> > > sync 1 1 776
> > > libaio 32 1 832
> > > syslet 32 1 898
> > > libaio 20000 1 581
> > > syslet 20000 1 609
> > >
> > > syslet still on top. Measuring O_DIRECT reads (of 4kb size) on ramfs
> > > with 100 processes each with a depth of 200, reading a per-process
> > > private file of 10mb (need to fit in my ram...) 10 times each. IOW,
> > > doing 10,000MiB of IO in total:
> >
> > But, why ramfs ? Don't we want to exercise the case where O_DIRECT actually
> > blocks ? Or am I missing something here ?
>
> Just overhead numbers for that test case, lets try something like your
> described job.
>
> Test case is doing random reads from /dev/sdb, in chunks of 64kb:
>
> Engine Depth Processes Bw (KiB/sec)
> ----------------------------------------------------
> libaio 200 100 2813
> syslet 200 100 3944
> libaio 20000 1 2793
> syslet 20000 1 3854
> sync (*) 20000 1 2866
>
> deadline was used for IO scheduling, to minimize impact. Not sure why
> syslet actually does so much better here, looing at vmstat the rate is
> steady and all runs are basically 50/50 idle/wait. One difference is
> that the submission itself takes a long time on libaio, since the
> io_submit() will block on request allocation. The generated IO pattern
> from each process is the same for all runs. The drive is a lousy sata
> that doesn't even do queuing, FWIW.


I tried the latest fio code with syslet v4, and my results are a little
different - have yet to figure out why or what to make of it.
I hope I have all the right pieces now.

This is an ext2 filesystem, SCSI AIC7xxx.

I used an iodepth_batch size of 8 to limit the number of ios in a single
io_submit (thanks for adding that parameter to fio !), like we did in
aio-stress.

Engine Depth Batch Bw (KiB/sec)
----------------------------------------------------
libaio 64 8 17,226
syslet 64 8 17,620
libaio 20000 8 18,552
syslet 20000 8 14,935


Which is not bad, actually.

If I do not specify the iodepth_batch (i.e. default to depth), then the
difference becomes more pronounced at higher depths. However, I doubt
whether anyone would be using such high batch sizes in practice ...

Engine Depth Batch Bw (KiB/sec)
----------------------------------------------------
libaio 64 default 17,429
syslet 64 default 16,155
libaio 20000 default 15,494
syslet 20000 default 7,971

Often times it is the application tuning that makes all the difference,
so am not really sure how much to read into these results.
That's always been the hard part of async io ...

Regards
Suparna

>
> [*] Just for comparison, the depth is obviously really 1 at the kernel
> side since it's sync.
>
> --
> Jens Axboe

--
Suparna Bhattacharya (suparna@xxxxxxxxxx)
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/