Re: [RFC PATCH 1/3 v2] futex: introduce FUTEX_SWAP operation

From: Andrei Vagin
Date: Wed Jun 17 2020 - 20:49:05 EST


On Tue, Jun 16, 2020 at 10:22:26AM -0700, Peter Oskolkov wrote:
> From 6fbe0261204692a7f488261ab3c4ac696b91db5c Mon Sep 17 00:00:00 2001
> From: Peter Oskolkov <posk@xxxxxxxxxx>
> Date: Tue, 9 Jun 2020 16:03:14 -0700
> Subject: [RFC PATCH 1/3 v2] futex: introduce FUTEX_SWAP operation
>
> This is an RFC!
>
> As Paul Turner presented at LPC in 2013 ...
> - pdf: http://pdxplumbers.osuosl.org/2013/ocw//system/presentations/1653/original/LPC%20-%20User%20Threading.pdf
> - video: https://www.youtube.com/watch?v=KXuZi9aeGTw
>
> ... Google has developed an M:N userspace threading subsystem backed
> by Google-private SwitchTo Linux Kernel API (page 17 in the pdf referenced
> above). This subsystem provides latency-sensitive services at Google with
> fine-grained user-space control/scheduling over what is running when,
> and this subsystem is used widely internally (called schedulers or fibers).
>
> This RFC patchset is the first step to open-source this work. As explained
> in the linked pdf and video, SwitchTo API has three core operations: wait,
> resume, and swap (=switch). So this patchset adds a FUTEX_SWAP operation
> that, in addition to FUTEX_WAIT and FUTEX_WAKE, will provide a foundation
> on top of which user-space threading libraries can be built.
>
> Another common use case for FUTEX_SWAP is message passing a-la RPC
> between tasks: task/thread T1 prepares a message,
> wakes T2 to work on it, and waits for the results; when T2 is done, it
> wakes T1 and waits for more work to arrive. Currently the simplest
> way to implement this is
>
> a. T1: futex-wake T2, futex-wait
> b. T2: wakes, does what it has been woken to do
> c. T2: futex-wake T1, futex-wait
>
> With FUTEX_SWAP, steps a and c above can be reduced to one futex operation
> that runs 5-10 times faster.
>

Hi Peter,

We have a good use-case in gVisor for this new futex command.

gVisor accesses a file system through a file proxy, called the Gofer.
The gofer runs as a separate process, that is isolated from the sandbox
(sentry). Gofer instances communicate with their respective sentry using
the 9P-like protocol. We used sockets as communication channels, but
recently we switched to the flipcall (1) library which improve
performance by using shared memory for data (reducing memory copies) and
using futexes for control signaling (which is much cheaper than
sendto/recvfrom/sendmsg/recvmsg).

I modified the flipcall library to use FUTEX_SWAP and I see a
significant performance improvement.

A low level benchmarks (2) shows that req/resp is more than five time
faster with FUTEX_SWAP than with FUTEX_WAKE&FUTEX_WAIT. This is more or
less the same test what you did.

* FUTEX_WAKE & FUTEX_WAIT
BenchmarkSendRecv-8 88396 13625 ns/op

* FUTEX_SWAP
BenchmarkSendRecv-8 479604 2524 ns/op

And a more high-level test (3) which benchmarks the open syscall in
gVisor shows about 40% improvements.

* FUTEX_WAKE & FUTEX_WAIT
BM_Open/1/real_time_mean 93996 ns

* FUTEX_SWAP
BM_Open/1/real_time_mean 53136 ns

I believe there are many use-cases for FUTEX_SWAP in other projects.

1. https://github.com/google/gvisor/tree/master/pkg/flipcall
2. https://github.com/google/gvisor/blob/master/pkg/flipcall/flipcall_test.go#L361
3. https://github.com/google/gvisor/blob/master/test/perf/linux/open_benchmark.cc

Thanks,
Andrei