Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation

From: Roland Dreier
Date: Mon Apr 25 2005 - 16:21:21 EST

Andrew> Do we care about that? A straightforward scenario under
Andrew> which this can happen is:

Andrew> a) app starts some read I/O in an asynchronous manner
Andrew> b) app forks
Andrew> c) child writes to one of the pages which is still under read I/O
Andrew> d) the read I/O completes
Andrew> e) the child is left with the old data plus the child's modification instead
Andrew> of the new data

Andrew> which is a very silly application which is giving itself
Andrew> unpredictable memory contents anyway.

Andrew> I assume there's a more sensible scenario?

You're right, that is a silly scenario ;) In fact if we mark vmas
with VM_DONTCOPY, then the child just crashes with a seg fault.

The type of thing I'm worried about is something like, for example:

a) app registers memory region with RDMA hardware -- in other words,
loads the device's translation table for future I/O
b) app forks
c) app writes to the registered memory region, and the kernel breaks
the COW for the (now read-only) page by mapping a new page
d) app starts an I/O that will do a DMA read from the region
e) device reads using the wrong, old mapping

This can be pretty insiduous because for example fork() + immediate
exec() or just using system() still leaves the parent with PTEs marked
read-only. If an application does overlapping memory registrations so
get_user_pages() is called a lot, then as far as I can see
can_share_swap_page() will always return 0 and the COW will happen
even if the child process has thrown out its original vmas.

Or if the counts are in the correct range, then there's a small window
between fork() and exec() where the parent process can screw itself
up, so most of the time the app works, until it doesn't.

- R.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at