Re: [PATCH] [13/16] POISON: The high level memory error handler inthe VM II

From: Wu Fengguang
Date: Wed Apr 29 2009 - 04:17:24 EST


On Thu, Apr 09, 2009 at 10:37:39AM -0400, Chris Mason wrote:
> On Thu, 2009-04-09 at 16:02 +0200, Andi Kleen wrote:
> > On Thu, Apr 09, 2009 at 09:30:29AM -0400, Chris Mason wrote:
> > > > Is that a correct assumption?
> > >
> > > Yes, the page won't become writeback when you're holding the page lock.
> > > But, the FS usually thinks of try_to_releasepage as a polite request.
> > > It might fail internally for a bunch of reasons.
> > >
> > > To make things even more fun, the page won't become writeback magically,
> > > but ext3 and reiser maintain lists of buffer heads for data=ordered, and
> > > they do the data=ordered IO on the buffer heads directly. writepage is
> > > never called and the page lock is never taken, but the buffer heads go
> > > to disk. I don't think any of the other filesystems do it this way.
> >
> > Ok, so do you think my code handles this correctly?
>
> Even though try_to_releasepage only checks page_writeback() the lower
> filesystems all bail on dirty pages or dirty buffers (see the checks
> done by try_to_free_buffers).
>
> It looks like the only way we have to clean a page and all the buffers
> in it is the invalidatepage call. But that doesn't return success or
> failure, so maybe invalidatepage followed by releasepage?
>
> I'll have to read harder next week, the FS invalidatepage may expect
> truncate to be the only caller.

If direct de-dirty is hard for some pages, how about just ignore them?
There are the PG_writeback pages anyway. We can inject code to
intercept them at the last stage of IO request dispatching.

Some perceivable problems and solutions are
1) the intercepting overheads could be costly => inject code at runtime.
2) there are cases that the dirty page could be copied for IO:
2.1) jbd2 has two copy-out cases => should be rare. just ignore them?
2.1.1) do_get_write_access(): buffer sits in two active commits
2.1.2) jbd2_journal_write_metadata_buffer(): buffer happens to start
with JBD2_MAGIC_NUMBER
2.2) btrfs have to read page for compress/encryption
Chris: is btrfs_zlib_compress_pages() a good place for detecting
poison pages? Or is it necessary at all for btrfs?(ie. it's
already relatively easy to de-dirty btrfs pages.)
2.3) maybe more cases...

> >
> > > If we really want the page gone, we'll have to tell the FS
> > > drop-this-or-else....sorry, its some ugly stuff.
> >
> > I would like to give a very strong hint at least. If it fails
> > we can still ignore it, but it will likely have negative consequences later.
> >
>
> Nod.
>
> > >
> > > The good news is, it is pretty rare. I wouldn't hold up the whole patch
> >
> > You mean pages with Private bit are rare? Are you suggesting to just
> > ignore those? How common is it to have Private pages which are not
> > locked by someone else?
> >
>
> PagePrivate is very common. try_to_releasepage failing on a clean page
> without the writeback bit set and without dirty/locked buffers will be
> pretty rare.

Yup. btrfs seems to tag most(if not all) dirty pages with PG_private.
While ext4 won't.

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/