Re: [RFC] revoke(2) and generic handling of things like remove_proc_entry()

From: Eric W. Biederman
Date: Thu Apr 11 2013 - 17:27:28 EST


Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Fri, Apr 05, 2013 at 05:29:32AM +0100, Al Viro wrote:
>> 4) nasty semantics issue - mmap() vs. revoke (of any sort, including
>> remove_proc_entry(), etc.). Suppose a revokable file had been mmapped;
>> now it's going away. What should we do to its VMAs? Right now sysfs
>> and procfs get away with that, but only because there's only one thing
>> that has ->mmap() there - /proc/bus/pci and sysfs equivalents. I've
>> no idea how does pci_mmap_page_range() interact with PCI hotplug (and
>> I'm not at all sure that whatever it does isn't racy wrt device removal),
>> but I suspect that it strongly depends on lack of ->fault() for those
>> VMAs, which makes killing all PTEs pointing to pages in question enough.
>> How generic do we want to make it? Anybody wanting to add more files
>> that could be mmapped in procfs/sysfs/debugfs deserves to be hurt, but
>> if we start playing with revoke(2), restriction might become inconvenient.
>> I'm not sure what kind of behaviour do we want there - *BSD at least
>> used to have revoke(2) only for character devices that had no mmap()...
>
> Actually, after looking at what sysfs does... We might get away with
> the following
> * new vma flag - VM_REVOKABLE; set by mmap() if ->f_revoke is
> non-NULL. We are short on spare bits there, but there still are some...
> * start_using_vma(vma) that checks the presence of that flag,
> returns true if it's absent and __start_using(vma->vm_file->f_revoke)
> otherwise; a matching stop_using_vma(vma) as well.
> * surround vma method calls with start_using_vma/stop_using_vma,
> similar to file ones. Do what fs/sysfs/bin.c wrappers do for revoked
> ones - VM_FAULT_SIGBUS for ->fault() and ->page_mkwrite(), -EINVAL for
> ->access() and ->set_policy(), vma->vm_policy for ->get_policy(),
> 0 for ->migrate(), "do nothing" for ->open() (and I'm not at all sure that
> this one is correct), hell knows what for ->close(). Note that the *only*
> instance with ->open and without ->close is sysfs pile of wrappers
> itself...

sysfs used to wrap close but nothing used it, and on review it turned to
be impossible to properly call close on all of the vmas on revoke with
all of the proper locks held without help from the mm subsystem. So now
sysfs refuses a mapping if ->close is implemented. In practice that
means having an implementation of ->open in sysfs is likely a waste of
space because open and closed are always paired.

My memory is that the sysfs wrappers work for a pretty narrow range of
cases, and are things only work correctly because memory mapping pci
uses none of the sophisticated facilities.

Last time I was looking at this I was noticing that there is a lock
(mmap_sem?) that is held over every ->vm_op->foo() call. If that is
true today it should be possible to just grab that lock and change
vm_ops. That makes for a very cheap and easy implementation, except for
the covolutions needed for taking the lock.

> Hell knows... We have few enough call sites for ->vm_op->foo() to make
> it feasible and overhead would be trivial. OTOH, I'm not sure what's the
> right behaviour for mmap of something like drm after revoke(2) - leaving
> writable pages there looks wrong...

I'm not certain of the peculiarities of drm. In general the correct
semantics of a revoked memory area is that of what we do with a mmap
area on truncate. Retaining the vma with no pages mapped and generating
SIGBUS on an access.

> BTW, snd_card_disconnect() doesn't do anything to existing mappings; smells
> like a bug, and there we do have ones with non-trivial ->mmap(). Could
> ALSA folks comment?

uio might be worth considering as well. As some uio devices are
actually hotpluggable..

> One note about the mockup implementation upthread - __release_revoke() should
> suck in a bit more than just ->release() - turning fasync off should also go
> there.

If we can do add useful support at the fs and mm layers without
affecting performance I am all for it. I remember that tends to make
things easier. As an alternative let me suggest what I had intended to
do if/when I ever got back to working on revoke.

Make a library like libfs that can be used for files that want to
implement revoke support.

In that library implement what can be implemented reliably and correctly
and error on the sophisticated cases we can't support.

With the semantics and the basic users figured out move what bits we can
into the vfs or the mm subsystem to make things easier.

With a library at the very least we have one implementation that we can
debug and work with instead of a different implementation of revoke for
each different kind of file.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/