Re: [GIT PULL] Ext3 latency fixes

From: Jens Axboe
Date: Sat Apr 04 2009 - 13:36:55 EST


On Sat, Apr 04 2009, Linus Torvalds wrote:
>
>
> On Sat, 4 Apr 2009, Linus Torvalds wrote:
>
> >
> >
> > On Sat, 4 Apr 2009, Jens Axboe wrote:
> > >
> > > Big nack on this patch. Ted, this is EXACTLY where I told you we saw big
> > > write regressions (sqlite performance drops by a factor of 4-5). Do a
> > > git log on fs/buffer.c and see the original patch (which does what your
> > > patch does) and the later revert. No idea why you are now suggestion
> > > making that exact change?!
> >
> > Jens, if I can re-create the 'fsync' times (I haven't yet), then the
> > default scheduler _will_ be switched to AS.
>
> Btw, that patch is "obviously correct".
>
> That write we're submitting is very much a synchronous write. After all,
> the code is literally
>
> ret = submit_bh(WRITE, bh);
> wait_on_buffer(bh);
>
> and it just doesn't get any more synchronous than that. If we don't start
> the IO immediately (since we're _waiting_ for it immediately), we're
> broken.
>
> Now, if we need to fix some mysql throughput issue as a result, then I'd
> suggest that we look at whether "sync_dirty_buffer()" is sometimes called
> when it doesn't need to be od (b) whether perhaps the unplugging behavior
> is simply buggy in some other way.
>
> But Ted's patch makes so much sense on a purely conceptual level, that
> when you look at the patch, you should almost not even need to see the
> performance numbers to know it's right. But together with the numbers Ted
> posted, it's a total no-brainer. CFQ is clearly broken here, and it's
> pretty clear that apparently CFQ has been tuned (improperly) purely for
> throughput.

I agree, hence I previously wrote and submitted an IDENTICAL patch. I'll
go and test things, not sure why you think that AS and CFQ perform very
differently here, to my knowledge no such postings exist for this test
case. And I'll state again that if they do, of course I'll look into
that and fix it.

One thing that may be of concern is the immediate unplug, but on the
other hand, we do an immediate wait which would unplug anyway. So if we
just look at the sqlite regression, I'm sure it'll be pretty easy to pin
point with some blktrace data.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/