Re: [PATCH 2/3] mm: Implement writeback livelock avoidance usingpage tagging

From: Jan Kara
Date: Mon Feb 15 2010 - 14:57:13 EST


On Tue 16-02-10 03:21:27, Nick Piggin wrote:
> On Mon, Feb 15, 2010 at 04:47:51PM +0100, Jan Kara wrote:
> > On Fri 12-02-10 11:39:55, Andrew Morton wrote:
> > > On Fri, 12 Feb 2010 00:06:23 +0100
> > > Jan Kara <jack@xxxxxxx> wrote:
> > >
> > > > The idea is simple: Tag all pages that should be written back
> > > > with a special tag (TOWRITE) in the radix tree. This can be done
> > > > rather quickly and thus livelocks should not happen in practice.
> > > > Then we start doing the hard work of locking pages and sending
> > > > them to disk only for those pages that have TOWRITE tag set.
> > >
> > > Adding a second pass across all the pages sounds expensive?
> > Strictly speaking it's just through the radix tree and only through
> > branches with DIRTY_TAG set. But yes, there is some additional CPU cost.
> > I just thought that given the total cost of submitting a page it is
> > an acceptable increase and the simplification is worth it.
> > Would some numbers make you happier? Any suggestion for measurements?
> > Because I think that even for writes to tmpfs the change will be lost
> > in the noise...
>
> Although hmm, if it is a very large file with *lots* of dirty pages
> then it might become a noticable proportion of the cost.
>
> Dave Chinner would probably tell you he's seen files with many
> gigabytes dirty, and what is nr_to_write set to? 1024 is it? So you
> might be tagging hundreds or thousands of radix tree entries per
> page you write.
Hmm, this is a good point. My final aim is to remove the nr_to_write
logic so then the only case when we break out of the loop in
write_cache_pages early would be when ->writepage returns an error.
But until then what you describe can happen quite regularly.
An obvious workaround is to limit the amount of tags transferred but that
might have to repeat retagging and it just gets ugly. Another solution
would be to remove the DIRTY tag as we set the TOWRITE tag. That might
actually work quite nicely what do you think?
Finally, I could complement this series with patches substituting
nr_to_write logic but that would make the series considerably more
complicated and controverse so I rather wanted to do it in small steps...

> Also, I wonder what you think about leaving the tags dangling when
> the loop bails out early? I have a *slight* concern about this
> because previously we never have a tag set when radix_tree_delete
> is called. I actually had a bug in that code in earlier versions
> of rcu radix tree that only got found by the user test harness.
> And another slight concern that it is just a bit ugly to leave the
> tag. But I can accept that lower CPU overhead trumps ugliness :)
I specifically checked __remove_from_page_cache -> radix_tree_delete
path to verify what happens and everything seems to handle that just
fine. BTW, note that cancel_dirty_page does not clear the radix tree
dirty tag either so during truncation radix_tree_delete will be called
with tagged pages as well.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/