Re: truncate shows non zero data beyond the end of the inode with MAP_SHARED

From: William Lee Irwin III
Date: Wed Sep 15 2004 - 16:10:05 EST


On Wed, Sep 15, 2004 at 02:29:20PM +0200, Andrea Arcangeli wrote:
> I've been told we're not posix compliant the way we handle MAP_SHARED
> on the last page of the inode. Basically after we map the page into
> userspace people can make the data beyond the i_size non-zero and we
> should clear it in the transition from page_mapcount 1 -> 0. The bug
> is that if you truncate-extend, the new data will not be guaranteed to
> be zero.
> msync + power outage and writing to the page with sys_write at the same
> time it's being mapped (and in turn queueing it for pdflush writeout)
> are the two worst offeners. To fix those we'd need to mark the pte
> readonly, flush the tlb with a worst-case IPI broadcast, writepage, then
> mark the pte read-write and flush the tlb again with another IPI
> broadcast.
> That is going to have a significant cost methinks. So maybe we shouldn't
> fix it after all...

Zeroing the final partial page during expanding truncate (flushing TLB)
sounds like a reasonable half measure; we don't do anything at the moment.
We could even, say, do a pass of pte shootdown given try_to_unmap() and
force minor faults to make all this less likely.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/