Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

From: Libor Michalek
Date: Fri Apr 29 2005 - 12:05:56 EST

On Fri, Apr 29, 2005 at 08:56:20AM -0700, Caitlin Bestler wrote:
> On 4/29/05, Bill Jordan <woodennickel@xxxxxxxxx> wrote:
> > On 4/26/05, Andrew Morton <akpm@xxxxxxxx> wrote:
> >
> > > Our point is that contemporary microprocessors cannot electrically
> > > do what you want them to do!
> > >
> > > Now, conceeeeeeiveably the kernel could keep track of the state of the
> > > pages down to the byte level, and could keep track of all COWed pages and
> > > could look at faulting addresses at the byte level and could copy sub-page
> > > ranges by hand from one process's address space into another process's
> > > after I/O completion. I don't think we want to do that.
> > >
> > > Methinks your specification is busted.
> >
> > I agree in principal. However, I expect this issue will come up with
> > more and more new specifications, and if it isn't addressed once in
> > the linux kernel, it will be kludged and broken many times in many
> > drivers.
> >
> > I believe we need an kernel level interface that will pin user pages,
> > and lock the user vma in a single step. The interface should be used
> > by drivers when the hardware mappings are done. If the process is
> > split into a user operation to lock the memory, and a driver operation
> > to map the hardware, there will always be opportunity for abuse.
> >
> > Reference counting needs to be done by this interface to allow
> > different hardware to interoperate.
> >
> > The interface can't overload the VM_LOCKED flag, or rely on any other
> > attributes that the user can tinker with via any other interface.
> >
> > And as much as I hate to admit it, I think on a fork, we will need to
> > copy parts of pages at the beginning or end of user I/O buffers.
> >
> I agree with all but the last part, in my opinion there is no need to deal
> with fork issues as long as solutions do not result in failures. There is
> *no* basis for a child process to expect that it will inherit RDMA resources.
> A child process that uses such resources will get undefined results, nothing
> further needs to be stated, and no heroic efforts are required to avoid them.

However, you have a potential problem with registered buffers that
do not begin or end on a page boundary, which is common with malloc.
If the buffer resides on a portion of a page, and you mark the vm
which contains that entire page VM_DONTCOPY, to ensure that the parent
has access to the exact physical page after the fork, the child will
not be able to access anything on that entire page. So if the child
expects to access data on the same page that happens to contain the
registered buffer it will get a segment violation.

The four situations we've discussed are:

1) Physical page does not get used for anything else.
2) Processes virtual to physical mapping remains fixed.
3) Same virtual to physical mapping after forking a child.
4) Forked child has access to all non-registered memory of
the parent.

The first two are now taken care of with get_user_pages, (we use to
use VM_LOCKED for the second case) third case is handled by setting
the vm to VM_DONTCOPY, and on the fourth case we've always punted,
but the real answer is to break partial pages into seperate vms and
mark them ALWAYS_COPY.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at