On Tue, Jun 3, 2025 at 8:37 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 03.06.25 20:29, Matthew Wilcox wrote:
On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
When fork() encounters possibly-pinned pages, those pages are immediately
copied instead of just marking PTEs to make CoW happen later. If the parent
is multithreaded, this can cause the child to see memory contents that are
inconsistent in multiple ways:
1. We are copying the contents of a page with a memcpy() while userspace
may be writing to it. This can cause the resulting data in the child to
be inconsistent.
2. After we've copied this page, future writes to other pages may
continue to be visible to the child while future writes to this page are
no longer visible to the child.
This means the child could theoretically see incoherent states where
allocator freelists point to objects that are actually in use or stuff like
that. A mitigating factor is that, unless userspace already has a deadlock
bug, userspace can pretty much only observe such issues when fancy lockless
data structures are used (because if another thread was in the middle of
mutating data during fork() and the post-fork child tried to take the mutex
protecting that data, it might wait forever).
Um, OK, but isn't that expected behaviour? POSIX says:
: A process shall be created with a single thread. If a multi-threaded
: process calls fork(), the new process shall contain a replica of the
: calling thread and its entire address space, possibly including the
: states of mutexes and other resources. Consequently, the application
: shall ensure that the child process only executes async-signal-safe
: operations until such time as one of the exec functions is successful.
It's always been my understanding that you really, really shouldn't call
fork() from a multithreaded process.
I have the same recollection, but rather because of concurrent O_DIRECT
and locking (pthread_atfork ...).
Using the allocator above example: what makes sure that no other thread
is halfway through modifying allocator state? You really have to sync
somehow before calling fork() -- e.g., grabbing allocator locks in
pthread_atfork().
Yeah, like what glibc does for its malloc implementation to prevent
allocator calls from racing with fork(), so that malloc() keeps
working after fork(), even though POSIX says that the libc doesn't
have to guarantee that.