Re: [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon.

From: Andrew Morton
Date: Sat May 13 2006 - 03:02:25 EST


Neil Brown <neilb@xxxxxxx> wrote:
>
> On Friday May 12, akpm@xxxxxxxx wrote:
> > NeilBrown <neilb@xxxxxxx> wrote:
> > >
> > > ./drivers/md/bitmap.c | 115 ++----------------------------------------
> >
> > hmm. I hope we're not doing any of that filesystem I/O within the context
> > of submit_bio() or kblockd or anything like that. Looks OK from a quick
> > scan.
>
> No. We do all the I/O from the context of the per-array thread.

OK.

> However some IO requests cannot complete until the filesystem I/O
> completes, so we need to be sure that the filesystem I/O won't block
> waiting for memory, or fail with -ENOMEM.

That sounds like a complex deadlock. Suppose the bitmap writeout requres
some writeback to happen before it can get enough memory to proceed.

> bitmap.c is currently trying to do something every different.
> It uses ->readpage to get pages in the page cache (even though some
> address spaces don't define ->readpage) and then holds onto those
> pages without holding the page lock, and then calls ->writepage to
> flush them out from time to time.

That's OK.

> Before calling writepage it gets the pagelock, but doesn't re-check
> that ->mapping is correct (there is nothing much it can do if it isn't
> correct..).

We do that to find out if the page was truncated while we ewre waiting for
the page lock.

> I noticed this is particularly a problem with tmpfs. When you call
> writepage on a tmpfs page, the page is swizzled into the swap cache,
> and ->mapping becomes NULL - not the behaviour that bitmap is
> expecting.

You shouldn't call writepage on tmpfs. If !bdi_cap_writeback_dirty(), your
write_page() should immediately give up and "succeed". Because this
filesystem has no permanent backing store anyway.

tmpfs doesn't implement bmap(), so your new patch fixes that problem ;)

> > We normally use PAGE_CACHE_SIZE for these things, not PAGE_SIZE. Same diff.
> >
>
> Yeah.... why is that? Why have two names for exactly the same value?

I guess at one time it was thought that we could use a different unit size
for pagecache. Nowadays we use it by convention - it has little more than
documentary value.

> How does a poor develop know when to use one and when the other? More
> particularly, how does one remember.

If the page is a pagecache page, use PAGE_CACHE_SIZE. There are a few
spots where it blurs, but not many.

> > If you have a page and you want to write the whole thing out then there's
> > really no need to run prepare_write or commit_write at all. Just
> > initialise the whole page, run set_page_dirty() then write_one_page().
> >
>
> I see that now. But only after locking the page, and rechecking that
> ->mapping is correct, and if it isn't .... well, more work is involved
> that bitmap is in a position to do.

If you've bumped i_writecount to block truncate then there's no need to
check for truncation after locking the page.

i_mutex could be used similarly.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/