Re: MAP_PRIVATE that stays private even on external write

From: Peter Zijlstra
Date: Tue Dec 02 2008 - 10:03:42 EST

On Tue, 2008-12-02 at 15:34 +0100, Miquel van Smoorenburg wrote:
> On Tue, 2008-12-02 at 11:27 +0100, Peter Zijlstra wrote:
> > On Tue, 2008-12-02 at 10:06 +0000, Miquel van Smoorenburg wrote:
> > > What I am looking for is a MAP_PRIVATE type flag that, when another
> > > process modifies pages of the file (through mmap() or write())
> > > makes sure that my mapping never sees that.
> > >
> > > There has been talk of a MAP_SNAPSHOT flag before, e.g.
> > >
> > >
> > > Has anyone ever looked at implementing something like this ?
> >
> > I suppose that needs a snapshot filesystem for backing, and we don't
> > have such a creature (yet). I suppose BTRFS might be able to pull that
> > off.
> That is a different way of looking at it. Interesting.
> It won't help me though, as the "file" I was talking about is in
> fact /dev/sdb .
> But the current MAP_PRIVATE doesn't need a versioning filesystem either.
> It's just that if you write to the mapping, that never gets visible in
> other mappings of the same file, nor in the file itself.
> I'd like to see something like that, but the other way around. If the
> file is modified through a different mapping or write(), modifications
> show up in the file and other shared mappings but not in my private
> mapping. A copy-on-write, but slightly different in what is copied.
> Since MAP_PRIVATE|MAP_POPULATE appears to do this, I thought that
> perhaps there is enough infrastructure to do something similar without
> MAP_POPULATE, but I really don't know enough of mm/* .

Thing is, MAP_PRIVATE isolates others from your changes, not you from
other changes.

A valid implementation could map the original file RO and only COW a
page on the first modification - and this is what I thought linux did.
Which means your MAP_POPULATE would do a writable get_user_pages() on it
(and I think that issue was once raised as a bug).

Now what you want is to basically snapshot a file. There is always space
overhead from doing that, that is, new changes need to go somewhere (or
conversely the old data needs to be kept around).

I guess you could do that by force populating MAP_SNAPSHOT mappings of
the same file on writeout to that file (when no such page already
exists), but before doing the writeout. But that sounds a little icky,
esp. as you force the space overhead (which is the full mmap in the
worst case) into memory.

Relying on filesystems to be able to keep the snapshot around, pushes
that overhead into the filesystem (which is typically much larger than
memory and thus the most suited place for this), which seems like a much
better solution.

Anyway, I think both solution are feasible for implementation, the trick
would be to convince 1) us that its a good idea and will not be yet
another unused 'feature' with significant complexity and 2) someone to
do it.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at