Re: [openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

From: Caitlin Bestler
Date: Fri Apr 29 2005 - 19:33:25 EST

On 4/29/05, Libor Michalek <libor@xxxxxxxxxxx> wrote:

> However, you have a potential problem with registered buffers that
> do not begin or end on a page boundary, which is common with malloc.
> If the buffer resides on a portion of a page, and you mark the vm
> which contains that entire page VM_DONTCOPY, to ensure that the parent
> has access to the exact physical page after the fork, the child will
> not be able to access anything on that entire page. So if the child
> expects to access data on the same page that happens to contain the
> registered buffer it will get a segment violation.
> The four situations we've discussed are:
> 1) Physical page does not get used for anything else.
> 2) Processes virtual to physical mapping remains fixed.
> 3) Same virtual to physical mapping after forking a child.
> 4) Forked child has access to all non-registered memory of
> the parent.
> The first two are now taken care of with get_user_pages, (we use to
> use VM_LOCKED for the second case) third case is handled by setting
> the vm to VM_DONTCOPY, and on the fourth case we've always punted,
> but the real answer is to break partial pages into seperate vms and
> mark them ALWAYS_COPY.
> -Libor
Attempting to provide *any* support for applications that fork children
after doing RDMA registrations is a ratshole best avoided. The general
rule that application developers should follow is to do RDMA *only*
in the child processes.

Keep in mind that it is not only the memory regions that must be dealt
with, but control data invisible to the user (the QP context, etc.). This
data frequently is interlinked between kernel residente and user resident
data (such as a QP context has the PD ID somewhere on-chip or in
kernel, which the Send Queue ring needs to be in user memory). Having
two different user processes that both think they have the user half to
this type of split data structure is just asking for trouble, even if you
manage to get the copy on write bit timing problems all solved.

All of this can be avoided by a simple rule: don't fork after opening
an RDMA device.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at