Re: [PATCH] [13/16] HWPOISON: The high level memory error handlerin the VM v3

From: Wu Fengguang
Date: Sun Jun 07 2009 - 12:02:48 EST


On Thu, Jun 04, 2009 at 02:25:24PM +0800, Nai Xia wrote:
> On Thu, May 28, 2009 at 10:50 PM, Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> > On Thu, May 28, 2009 at 09:45:20PM +0800, Andi Kleen wrote:
> >> On Thu, May 28, 2009 at 02:08:54PM +0200, Nick Piggin wrote:
> >
> > [snip]
> >
> >> >
> >> > BTW. I don't know if you are checking for PG_writeback often enough?
> >> > You can't remove a PG_writeback page from pagecache. The normal
> >> > pattern is lock_page(page); wait_on_page_writeback(page); which I
> >>
> >> So pages can be in writeback without being locked? I still
> >> wasn't able to find such a case (in fact unless I'm misreading
> >> the code badly the writeback bit is only used by NFS and a few
> >> obscure cases)
> >
> > Yes the writeback page is typically not locked. Only read IO requires
> > to be exclusive. Read IO is in fact page *writer*, while writeback IO
> > is page *reader* :-)
>
> Sorry for maybe somewhat a little bit off topic,
> I am trying to get a good understanding of PG_writeback & PG_locked ;)
>
> So you are saying PG_writeback & PG_locked are acting like a read/write lock?
> I notice wait_on_page_writeback(page) seems always called with page locked --

No. Note that pages are not locked in wait_on_page_writeback_range().

> that is the semantics of a writer waiting to get the lock while it's
> acquired by
> some reader:The caller(e.g. truncate_inode_pages_range() and
> invalidate_inode_pages2_range()) are the writers waiting for
> writeback readers (as you clarified ) to finish their job, right ?

Sorry if my metaphor confused you. But they are not typical
reader/writer problems, but more about data integrities.

Pages have to be "not under writeback" when truncated.
Otherwise data lost is possible:

1) create a file with one page (page A)
2) truncate page A that is under writeback
3) write to file, which creates page B
4) sync file, which sends page B to disk quickly

Now if page B reaches disk before A, the new data will be overwritten
by truncated old data, which corrupts the file.

> So do you think the idea is sane to group the two bits together
> to form a real read/write lock, which does not care about the _number_
> of readers ?

We don't care number of readers here. So please forget about it.

Thanks,
Fengguang

> > The writeback bit is _widely_ used. Âtest_set_page_writeback() is
> > directly used by NFS/AFS etc. But its main user is in fact
> > set_page_writeback(), which is called in 26 places.
> >
> >> > think would be safest
> >>
> >> Okay. I'll just add it after the page lock.
> >>
> >> > (then you never have to bother with the writeback bit again)
> >>
> >> Until Fengguang does something fancy with it.
> >
> > Yes I'm going to do it without wait_on_page_writeback().
> >
> > The reason truncate_inode_pages_range() has to wait on writeback page
> > is to ensure data integrity. Otherwise if there comes two events:
> > Â Â Â Âtruncate page A at offset X
> > Â Â Â Âpopulate page B at offset X
> > If A and B are all writeback pages, then B can hit disk first and then
> > be overwritten by A. Which corrupts the data at offset X from user's POV.
> >
> > But for hwpoison, there are no such worries. If A is poisoned, we do
> > our best to isolate it as well as intercepting its IO. If the interception
> > fails, it will trigger another machine check before hitting the disk.
> >
> > After all, poisoned A means the data at offset X is already corrupted.
> > It doesn't matter if there comes another B page.
> >
> > Thanks,
> > Fengguang
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at Âhttp://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at Âhttp://www.tux.org/lkml/
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/