Re: [PATCH] Improve buffered streaming write ordering

From: Chris Mason
Date: Fri Oct 03 2008 - 08:08:22 EST


On Fri, 2008-10-03 at 12:43 +1000, Nick Piggin wrote:
> On Friday 03 October 2008 11:11, Chris Mason wrote:

> > > Part of that can happen due to shrink_page_list -> pageout -> writepagee
> > > call back with lots of unallocated buffer_heads(blocks). Also a journal
> > > commit with jbd2 looks at the inode and all the dirty pages, rather than
> > > the buffer_heads (journal_submit_data_buffers). We don't force commit
> > > pages that doesn't have blocks allocated with the ext4. The consistency
> > > is only with i_size and data.
> >
> > In general, I don't think pdflush or the VM expect
> > redirty_pages_for_writepage to be used this aggressively.
>
> BTW. redirty_page_for_writepage and the whole model of cleaning the page's
> dirty bit *before* calling into the filesystem is really nasty IMO. For
> one thing it opens races that mean a filesystem can't keep metadata about
> the pagecache properly in synch with the page's dirty bit.
>
> I have a patch in my fsblock series that fixes this and has the writepage()
> function itself clear the page's dirty bit. This basically makes
> redirty_page_for_writepages go away completely (at least the uses I looked
> at, I didn't look at ext4 though).
>
> Shall I break it out and submit it?

It's a fair amount of churn in the FS code, and the part I'm not sure of
is if the bigger problem is lock ordering around the page lock and the
FS locks or just the dirty bit.

Personally I'd rather see writepages used everywhere, giving the FS the
chance to do more efficient IO.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/