Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspaceverbs implementation

From: Andrew Morton
Date: Tue Apr 26 2005 - 19:06:57 EST

Roland Dreier <roland@xxxxxxxxxxx> wrote:
> Andrew> Well I was vaguely proposing that the userspace library
> Andrew> keep track of the byteranges and the underlying page
> Andrew> states. So in the above scenario userspace would leave
> Andrew> the page at 0x1000 registered until all registrations
> Andrew> against that page have been undone.
> OK, I already have code in userspace that keeps reference counts for
> overlapping regions, etc. However I'm not sure how to tie this in
> with reliable accounting of pinned memory -- we don't want malicious
> userspace code to be able fool the accounting, right?
> So I'm still trying to puzzle out what to do. I don't want to keep a
> complicated data structure in the kernel keeping track of what memory
> has been registered. Right now, I just keep a list of structs, one
> for each region, and when a process dies, I just go through region by
> region and do a put_page() to balance off the get_user_pages().
> However I don't see how to make it work if I put the reference
> counting for overlapping regions in userspace but when I want mlock()
> accounting in the kernel. If a buggy/malicious app does:
> a) register from 0x0000 to 0x2fff
> b) register from 0x1000 to 0x1fff
> c) unregister from 0x0000 to 0x2fff

As far as the kernel is concerned, step b) should be a no-op. (The kernel
might choose to split the vma, but that's not significant).

> then it seems the kernel is screwed unless it counts how many times a
> vma has been pinned. And adding a pin_count member to vm_struct seems
> like a pretty damn major step.
> We definitely have to make sure that userspace is never able to either
> unpin a page that is still registered with RDMA hardware, because that
> can lead to DMA to into memory that someone else owns. On the other
> hand, we don't want userspace to be able to defeat resource accounting
> by tricking the kernel into keeping page_count elevated after it
> credits the memory back to a process's limit on locked pages.

The kernel can simply register and unregister ranges for RDMA. So
effectively a particular page is in either the registered or unregistered
state. Kernel accounting counts the number of registered pages and
compares this with rlimits.

On top of all that, your userspace library needs to keep track of when
pages should really be registered and unregistered with the kernel. Using
overlap logic and per-page refcounting or whatever.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at