Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

From: Michael S. Tsirkin
Date: Sun Dec 24 2006 - 16:35:58 EST


> Quoting Linus Torvalds <torvalds@xxxxxxxx>:
> Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
>
> Peter, tell me I'm crazy, but with the new rules, the following condition
> is a bug:
>
> - shared mapping
> - writable
> - not already marked dirty in the PTE
>
> because that combination means that the hardware can mark the PTE dirty
> without us even realizing (and thus not marking the "struct page *"
> dirty).

Er.
Sorry about bumping in, and I'm not sure I understand all of the discussion,
but this reminded me of an old issue with COW that created what looks
like a vaguely similiar data corruption on infiniband. We solved this for
infiniband with MADV_DONTFORK, but I always wondered why does it not affect
other parts of kernel. Small reminder from that discussion:

down mmap sem
get user pages
up mmap sem
page becomes shared, and COW (e.g. fork)
process writes to first byte of page <----- gets a copy
Now we had a problem: struct page that we got from get user pages
does not point to a correct page in our process.
For example: if at some point we map this page for DMA, and
hardware writes to last byte of page -----> process does not
see this data.

So for infiniband, what we do is a combination of
- prevent page from becoming COW while hardware might DMA to this page, and
- ask users not to write to page if hardware might DMA to same page
(even if its using different bytes).

I just wandered - is there some chance something like this could be happening in
the fs code?

HTH,

--
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/