Re: [PATCH 1/3] fs: Introduce new flag FALLOC_FL_COLLAPSE_RANGE

From: Theodore Ts'o
Date: Thu Aug 01 2013 - 00:06:57 EST

On Thu, Aug 01, 2013 at 12:59:14PM +1000, Dave Chinner wrote:
> This funtionality is not introducing any new problems w.r.t. mmap().
> In terms of truncating a mmap'd file, that can already occur and
> the behaviour is well known.

That's not race I'm worried about. Consider the following scenario.
We have a 10 meg file, and the first five megs are mapped read/write.
We do a collapse range of one meg starting at the one meg boundary.
Menwhile another process is constantly modifying the pages between the
3 meg and 4 meg mark, via a mmap'ed region of memory.

You can have the code which does the collapse range call
filemap_fdatawrite_range() to flush out to disk all of the inode's
dirty pages in the page cache, but between the call to
filemap_fdatawrite_range() and truncate_page_cache_range(), more pages
could have gotten dirtied, and those changes will get lost.

The difference between punch_hole and collapse_range is that
previously, we only needed to call truncate_page_cache_range() on the
pages that were going to be disappearing from the page cache, so it
didn't matter if you had someone dirtying the pages racing with the
punch_hole operation. But in the collapse_range case, we need to call
truncate_page_cache_range() on data that is not going to disappear,
but rather be _renumbered_.

I'll also note that depending on the use case, doing the renumbering
of the pages by throwing all of the pages from the page cache and
forcing them to be read back in from disk might not all be friendly to
a performance sensitive application.

In the case where the page size == the fs block size, instead of
throwing away all of the pages, we could also have VM code which
remaps the pages in the page cache (including potentially editing the
page tables for any mmap'ed pages). This of course gets _much_ more
complicated, and we still have to deal with the case where the
collapse_range call is aligned with the fs block size, but which is
not aligned or is not a muliple of the page size.

- Ted
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at