Re: [RFC][PATCH 6/8] mm: handle_speculative_fault()

From: Linus Torvalds
Date: Mon Jan 04 2010 - 22:14:51 EST




On Tue, 5 Jan 2010, KAMEZAWA Hiroyuki wrote:
>
> I'm sorry if I miss something...how does this patch series avoid
> that vma is removed while __do_fault()->vma->vm_ops->fault() is called ?
> ("vma is removed" means all other things as freeing file struct etc..)

I don't think you're missing anything.

Protecting the vma isn't enough. You need to protect the whole FS stack
with rcu. Probably by moving _all_ of "free_vma()" into the RCU path
(which means that the whole file/inode gets de-allocated at that later RCU
point, rather than synchronously). Not just the actual kfree.

However, it's worth noting that that actually has some very subtle and
major consequences. If you have a temporary file that was removed, where
the mmap() was the last user that kind of delayed freeing would also delay
the final fput of that file that actually deletes it.

Or put another way: if the vma was a writable mapping, a user may do

munmap(mapping, size);

and the backing file is still active and writable AFTER THE MUNMAP! This
can be a huge problem for something that wants to unmount the volume, for
example, or depends on the whole writability-vs-executability thing. The
user may have unmapped it, and expects the file to be immediately
non-busy, but with the delayed free that isn't the case any more.

In other words, now you may well need to make munmap() wait for the RCU
grace period, so that the user who did the unmap really is synchronous wrt
the file accesses. We've had things like that before, and they have been
_huge_ performance problems (ie it may take just a timer tick or two, but
then people do tens of thousands of munmaps, and now that takes many
seconds just due to RCU grace period waiting.

I would say that this whole series is _very_ far from being mergeable.
Peter seems to have been thinking about the details, while missing all the
subtle big picture effects that seem to actually change semantics.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/