Re: 2.6.19 file content corruption on ext3

From: Martin Schwidefsky
Date: Wed Dec 20 2006 - 09:27:56 EST


On Wed, 2006-12-20 at 10:01 +0100, Peter Zijlstra wrote:
> Also, what is this page_test_and_clear_dirty() business, that seems to
> be exclusively s390 btw. However they do seem to need this.
>
> > But the "ptep_get_and_clear() + flush_tlb_page()" sequence should
> > hopefully also work.
>
> Yeah, probably, not optimally so on some archs that don't actually need
> the flush though. And as above, I wonder about s390.

Simple, the s390 architecture does not keep the dirty bit in the pte but
in something called the storage key. For each physical page there is one
associated storage key. It is accessed with special instructions like
"iske", "sske" or "rrbe". To clear the dirty bit the storage key of a
page is read with iske, the bit is cleared and the storage key is stored
back with sske. That means that clearing the dirty bit is not an atomic
operation. rrbe is used to test and clear the referenced bit (young/old
infomation) and is atomic in regard to other storage key operations. If
you think about it, the storage keys are quite nice for the operating
system, page_referenced() can be implemented with a single test
"page_test_and_clear_young()". No need to read all the ptes pointing to
the page. The downside is that the storage keys have a cost on the
hardware side.

--
blue skies,
Martin.

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/