Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot

From: Matthew Wilcox
Date: Tue Jun 03 2025 - 14:30:06 EST


On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
> When fork() encounters possibly-pinned pages, those pages are immediately
> copied instead of just marking PTEs to make CoW happen later. If the parent
> is multithreaded, this can cause the child to see memory contents that are
> inconsistent in multiple ways:
>
> 1. We are copying the contents of a page with a memcpy() while userspace
> may be writing to it. This can cause the resulting data in the child to
> be inconsistent.
> 2. After we've copied this page, future writes to other pages may
> continue to be visible to the child while future writes to this page are
> no longer visible to the child.
>
> This means the child could theoretically see incoherent states where
> allocator freelists point to objects that are actually in use or stuff like
> that. A mitigating factor is that, unless userspace already has a deadlock
> bug, userspace can pretty much only observe such issues when fancy lockless
> data structures are used (because if another thread was in the middle of
> mutating data during fork() and the post-fork child tried to take the mutex
> protecting that data, it might wait forever).

Um, OK, but isn't that expected behaviour? POSIX says:

: A process shall be created with a single thread. If a multi-threaded
: process calls fork(), the new process shall contain a replica of the
: calling thread and its entire address space, possibly including the
: states of mutexes and other resources. Consequently, the application
: shall ensure that the child process only executes async-signal-safe
: operations until such time as one of the exec functions is successful.

It's always been my understanding that you really, really shouldn't call
fork() from a multithreaded process.