Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)

From: Jim Newsome
Date: Fri Mar 12 2021 - 15:02:36 EST


(Re-sent without html part, which the list rejected)

On 3/12/21 12:47, Andrew Morton wrote:

> IOW, please spend a bit of time selling the patch! What is the case
> for including it in Linux? What benefit does it provide our users?

Ah yes - I'd included some context when I first reached out to Oleg, but
that never made it to the list :).

I'm helping develop a new ptrace-based version of Shadow [1] - a tool
for simulating a (potentially large) network. Shadow runs the network
user-space applications in an emulated environment, and routes network
traffic through a model of the network accounting for latency,
bandwidth, etc. The Tor Project plans to make increasing use of Shadow
both for focused evaluation of specific proposed software and parameter
changes, attacks, and defenses, and as a regular automated performance
evaluation prior to deployment of new versions. Today Shadow is already
actively used in the research community for applications including tor
and bitcoin.

We're interested in running simulations including at least tens of
thousands of processes, with a stretch goal of being able to handle 1M
processes. Since each process is being ptraced, calling an O(n) waitpid
has a huge performance penalty at this scale, and results in simulation
performance growing ~quadratically with the size of the simulation.

We do have a workaround where we use a "fork proxy" thread to actually
fork all the processes, and we stop and detach inactive processes. (The
number of "active" processes is roughly fixed to the number of worker
threads, which is generally the # of CPUs available). i.e. this keeps
the number of children and tracees small and fixed, allowing us to scale
linearly. However, having to detach and reattach tracees adds a
significant linear overhead factor. This kernel patch would allow us to
get rid of the complexity and extra overhead of this workaround, and
benefit other applications that haven't implemented such a workaround.

We have some details and analysis of this issue in GitHub [2]. I haven't
added results with the patch yet, but plan to do so.

This should also help other applications that have a large number of
children or tracees. Ptrace-based emulation tools are probably the most
likely to benefit. e.g. I suspect it'll help User Mode Linux [3], which
IIUC uses a single tracer thread to ptrace every process running on its
kernel. Likewise it could help DetTrace [4], which uses ptrace for
deterministic software builds.

[1]: https://shadow.github.io/
[2]: https://github.com/shadow/shadow/issues/1134
[3]: https://en.wikipedia.org/wiki/User-mode_Linux
[4]: https://github.com/dettrace/dettrace