Re: [PATCH/RFC G-U-P experts] IB/umem: Modernize our get_user_pages() parameters

From: Roland Dreier
Date: Mon Feb 06 2012 - 12:40:06 EST

Sorry for the slow reply, I got caught in other business...

On Mon, Jan 30, 2012 at 12:34 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> The hardest part about implementing that is deciding what snappy
> name to give the FOLL_ flag.

(plus a good comment explaining it I guess)

>> I don't think we want to do the "force" semantics or deal with the
>> VM_MAYWRITE possiblity -- the access we give the hardware on
>> behalf of userspace should just match the access that userspace
>> actually has.  It seems that if we don't try to get pages for writing
>> when VM_WRITE isn't set, we don't need force anymore.
> I suspect you never needed or wanted the weird force behaviour on
> shared maywrite, but that you did need the force COW behaviour on
> private currently-unwritable maywrite.  You (or your forebears)
> defined that interface to use the force flag, I'm guessing it was
> for a reason; now you want to change it not to use the force flag,
> and it sounds good, but I'm afraid you'll discover down the line
> what the force flag was for.

Actually I think I understand why the original code passed !write
as the force parameter.

If the user is registering memory with read-only access, there are
two common cases. Possibly the underlying memory really has
a read-only mapping, but probably more often it is just an ordinary
buffer allocated in userspace with malloc() or the like.

In the second case, it's quite likely we have a read/write mapping
of anonymous pages. We'll expose it read-only for RDMA but the
userspace process will write data into the memory via ordinary CPU
access. However, if we do ibv_reg_mr() before initializing the memory
it's quite possible that the mapping actually points to the zero page,
waiting for a CPU write to trigger a COW.

So in the second case, doing GUP without the write flag will leave
the COW untriggered, and we'll end up mapping the zero page to
the hardware, and RDMA won't read the data that userspace actually
writes. So (without GUP extension as we're discussing in this thread)
we're forced to pass write==1 to GUP, even if we expect hardware
to only do reads.

But if we pass write==1, then GUP on the first case (mapping that
is genuinely read-only) will fail, unless we pass force==1 too. But
this should only succeed if we're going to only access the memory
read-only, so we should set force to !writable-access-by-rdma.

Which I think explains why the code is the way it is. But clearly
we could do better if we had a better way of telling GUP our real
intentions -- ie the FOLL_READONLY_COW flag.

> Can you, for example, enforce the permissions set up by the user?
> I mean, if they do the ibv_reg_mr() on a private readonly area,
> so __get_user_pages with the FOLL_APPROPRIATELY flag will fault
> in ZERO_PAGEs, can you enforce that RDMA will never spray data
> into those pages?

Yes, the access flags passed into ibv_reg_mr() are enforced by
the RDMA hardware, so if no write access is request, no write
access is possible.

And presumably if we do GUP with write==1, force==0 that will
fail on a read-only mapping?

- R.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at