Re: mapping user space buffer to kernel address space

From: Andrea Arcangeli (andrea@suse.de)
Date: Mon Oct 16 2000 - 17:13:49 EST


On Mon, Oct 16, 2000 at 10:14:01PM +0100, Stephen C. Tweedie wrote:
> [..] If the VM
> chooses to unmap the page temporarily, then as long as the page
> remains in physical memory, then a subsequent page fault will
> reestablish the mapping. [..]

Correct. But the problem is that the page won't stay in physical memory after
we finished the I/O because swap cache with page count 1 will be freed by the
VM. And even it stays in physical memory because a read fault happened on the
swap cache before we freed it, we could have swap cache that isn't coherent
with the on-disk data (that could lead to the same problem but later).

And anyways from a design standpoint it looks much better to really pin the
page in the pte too (just like kernel reserved pages are pinend after a
remap_page_range).

> It's an important distinction, because we really would rather avoid
> taking the page lock. If you happen to have the same page mapped into
> multiple VAs within the region being written, should the kernel
> object? [..]

I see your point but if you want to allow that we should simply check if the
page that we can't lock is just locked in the earlier part of the kiobuf (just
browsing backwards the iobuf->maplist). I just don't think that not locking the
page is the right way to provide that feature.

I think it's not horribly wrong if people can't do map_user_kiobuf on virtual
pages pointing to the same physical page because that functionality is quite
special, just like rawio is quite special in requiring people to use
hardblocksize aligned buffers. And yes, we could also allow people to do
rawio without that userspace alignment constraint but with that constraint
we force them to do zero copy.

> shouldn't matter. For some other applications, it might be important,
> which is why I wanted to keep the map_user_kiobuf() and lock_kiobuf()
> functions distinct.

I'm not sure which are those apps but if we need we can easily handle that case
by browsing backwards the maplist in the TryLockPage == 1 slow path.

> Note that if you have a threaded application and another thread goes
> messing with the MM while your IO is in progress, then it's possible
> that the pages in the user's VA at the end of the IO are not the same
> as the ones that were there at the start. No big deal, that's no
> different to the situation if you have any other read or write going
> on in parallel with other MM activity.

Yep, that looks ok.

> > + if (!(map = follow_page(ptr, write))) {
> > + char * user_ptr = (char *) ptr;
> > + char c;
> > spin_unlock(&mm->page_table_lock);
> > + up(&mm->mmap_sem);
> > + if (get_user(c, user_ptr))
> > + goto out;
> > + if (write && put_user(c, user_ptr))
> > + goto out;
> > + down(&mm->mmap_sem);
> > + goto refind;
>
> looks unnecessarily messy. We don't need the get_user() in ptrace:
> why do we need it here? Similarly, the put_user should be dealt with
> by the handle_mm_fault. We already absolutely rely on the fact that
> handle_mm_fault never continually fails to make progress for ever. If
> it did, then a page fault could produce an infinite loop in the
> kernel.

Replacing the get_user/put_user with handle_mm_fault _after_ changing
follow_page to check the dirty bit too in the write case should be ok.

If you want I can do this change plus the backwards maplist check for allowing
mapping the same physical page multiple times in a kiobuf (the unmap_kiobuf
will unlock the same physical page multiple times but that's ok).

> Once I'm back in the UK I'll look at getting map_user_kiobuf() simply
> to call the existing access_one_page() from ptrace. You're right,

access_one_page is missing the pagetable lock too, but that seems the only
problem. I'm not convinced mixing the internal of access_one_page and
map_user_kiobuf is a good thing since they needs to do a very different thing
in the critical section.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Oct 23 2000 - 21:00:10 EST