Re: 2.6.19 file content corruption on ext3

From: Peter Zijlstra
Date: Tue Dec 19 2006 - 04:07:37 EST


On Tue, 2006-12-19 at 10:00 +0100, Peter Zijlstra wrote:
> On Tue, 2006-12-19 at 00:04 -0800, Linus Torvalds wrote:
>
> > Nobody has actually ever explained why "test_clear_page_dirty()" is good
> > at all.
> >
> > - Why is it ever used instead of "clear_page_dirty_for_io()"?
> >
> > - What is the difference?
> >
> > - Why would you EVER want to clear bits just in the "struct page *" or
> > just in the PTE's?
> >
> > - Why is it EVER correct to clear dirty bits except JUST BEFORE THE IO?
> >
> > In other words, I have a theory:
> >
> > "A lot of this is actually historical cruft. Some of it may even be code
> > that was never supposed to work, but because we maintained _other_ dirty
> > bits in the PTE's, and never touched them before, we never even realized
> > that the code that played with PG_dirty was totally insane"
> >
> > Now, that's just a theory. And yeah, it may be stated a bit provocatively.
> > It may not be entirely correct. I'm just saying.. maybe it is?
>
> On Sun, 2006-12-17 at 15:40 -0800, Andrew Morton wrote:
>
> > try_to_free_buffers() clears the page's dirty state if it successfully removed
> > the page's buffers.
> >
> > Background for this:
> >
> > - a process does a one-byte-write to a file on a 64k pagesize, 4k
> > blocksize ext3 filesystem. The page is now PageDirty, !PgeUptodate and
> > has one dirty buffer and 15 not uptodate buffers.
> >
> > - kjournald writes the dirty buffer. The page is now PageDirty,
> > !PageUptodate and has a mix of clean and not uptodate buffers.
> >
> > - try_to_free_buffers() removes the page's buffers. It MUST now clear
> > PageDirty. If we were to leave the page dirty then we'd have a dirty, not
> > uptodate page with no buffer_heads.
> >
> > We're screwed: we cannot write the page because we don't know which
> > sections of it contain garbage. We cannot read the page because we don't
> > know which sections of it contain modified data. We cannot free the page
> > because it is dirty.
>
> However!! this is not true for mapped pages because mapped pages must
> have the whole (16k in akpm's example) page loaded. Hence I suspect that
> what Andrei did by accident - remove the if (mapping) case in
> test_clean_dirty_pages() - is actually totally correct.

Obviously I need my morning shot, 64k ofcourse.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/