Re: Read starvation by sync writes

From: Jan Kara
Date: Wed Dec 12 2012 - 05:10:58 EST

Next message: Benoit Cousson: "Re: [PATCH v3 0/3] ARM/dts: omap3: Add DT support for IGEP devices"
Previous message: Santosh Shilimkar: "Re: [PATCH] ARM: decompressor: Flush tlb before swiching domain 0to client mode"
In reply to: Shaohua Li: "Re: Read starvation by sync writes"
Next in thread: Jens Axboe: "Re: Read starvation by sync writes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed 12-12-12 10:55:15, Shaohua Li wrote:
> 2012/12/11 Jan Kara <jack@xxxxxxx>:
> > Hi,
> >
> > I was looking into IO starvation problems where streaming sync writes (in
> > my case from kjournald but DIO would look the same) starve reads. This is
> > because reads happen in small chunks and until a request completes we don't
> > start reading further (reader reads lots of small files) while writers have
> > plenty of big requests to submit. Both processes end up fighting for IO
> > requests and writer writes nr_batching 512 KB requests while reader reads
> > just one 4 KB request or so. Here the effect is magnified by the fact that
> > the drive has relatively big queue depth so it usually takes longer than
> > BLK_BATCH_TIME to complete the read request. The net result is it takes
> > close to two minutes to read files that can be read under a second without
> > writer load. Without the big drive's queue depth, results are not ideal but
> > they are bearable - it takes about 20 seconds to do the reading. And for
> > comparison, when writer and reader are not competing for IO requests (as it
> > happens when writes are submitted as async), it takes about 2 seconds to
> > complete reading.
> >
> > Simple reproducer is:
> >
> > echo 3 >/proc/sys/vm/drop_caches
> > dd if=/dev/zero of=/tmp/f bs=1M count=10000 &
> > sleep 30
> > time cat /etc/* 2>&1 >/dev/null
> > killall dd
> > rm /tmp/f
> >
> > The question is how can we fix this? Two quick hacks that come to my mind
> > are remove timeout from the batching logic (is it that important?) or
> > further separate request allocation logic so that reads have their own
> > request pool. More systematic fix would be to change request allocation
> > logic to always allow at least a fixed number of requests per IOC. What do
> > people think about this?
>
> As long as queue depth > workload iodepth, there is little we can do
> to prioritize tasks/IOC. Because throttling a task/IOC means queue
> will be idle. We don't want to idle a queue (especially for SSD), so
> we always push as more requests as possible to the queue, which
> will break any prioritization. As far as I know we always have such
> issue in CFQ for big queue depth disk.
Yes, I understand that. But actually big queue depth on its own doesn't
make the problem really bad (at least for me). When the reader doesn't have
to wait for free IO requests, it progresses at a reasonable speed. What
makes it really bad is that big queue depth effectively disallows any use
of ioc_batching() mode for the reader and thus it blocks in request
allocation for every single read request unlike writer which always uses
its full batch (32 requests).

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Benoit Cousson: "Re: [PATCH v3 0/3] ARM/dts: omap3: Add DT support for IGEP devices"
Previous message: Santosh Shilimkar: "Re: [PATCH] ARM: decompressor: Flush tlb before swiching domain 0to client mode"
In reply to: Shaohua Li: "Re: Read starvation by sync writes"
Next in thread: Jens Axboe: "Re: Read starvation by sync writes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]