[RFC PATCH net-next 0/3] sock: Be aware of memcg pressure on alloc

From: Abel Wu
Date: Fri Sep 01 2023 - 02:23:38 EST


As a cloud service provider, we encountered a problem in our production
environment during the transition from cgroup v1 to v2 (partly due to the
heavy taxes of accounting socket memory in v1). Say one workload behaves
fine in cgroupv1 with memcg limit configured to 10GB memory and another
1GB tcpmem, but will suck (or even be OOM-killed) in v2 with 11GB memory
due to burst memory usage on socket, since there is no specific limit for
socket memory in cgroupv2 and relies largely on workloads doing traffic
control themselves.

It's rational for the workloads to build some traffic control to better
utilize the resources they bought, but from kernel's point of view it's
also reasonable to suppress the allocation of socket memory once there is
a shortage of free memory, given that performance degradation is usually
better than failure.

This patchset aims to be more conservative on alloc for pressure-aware
sockets under global and/or memcg pressure, to avoid further memstall or
possibly OOM in such case. The patchset includes:

1/3: simple code cleanup, no functional change intended.
2/3: record memcg pressure level to enable fine-grained control.
3/3: throttle alloc for pressure-aware sockets under pressure.

The whole patchset focuses on the pressure-aware protocols, and should
have no/little impact on pressure-unaware protocols like UDP etc.

Tested on Intel Xeon(R) Platinum 8260, a dual socket machine containing 2
NUMA nodes each of which has 24C/48T. All the benchmarks are done inside a
separate memcg in a clean host.

baseline: net-next c639a708a0b8
compare: baseline + patchset

case load baseline(std%) compare%( std%)
tbench-loopback thread-24 1.00 ( 0.50) -0.98 ( 0.87)
tbench-loopback thread-48 1.00 ( 0.76) -0.29 ( 0.92)
tbench-loopback thread-72 1.00 ( 0.75) +1.51 ( 0.14)
tbench-loopback thread-96 1.00 ( 4.11) +1.29 ( 3.73)
tbench-loopback thread-192 1.00 ( 3.52) +1.44 ( 3.30)
TCP_RR thread-24 1.00 ( 1.87) -0.87 ( 2.40)
TCP_RR thread-48 1.00 ( 0.92) -0.22 ( 1.61)
TCP_RR thread-72 1.00 ( 2.35) +2.42 ( 2.27)
TCP_RR thread-96 1.00 ( 2.66) -1.37 ( 3.02)
TCP_RR thread-192 1.00 ( 13.25) +0.29 ( 11.80)
TCP_STREAM thread-24 1.00 ( 1.26) -0.75 ( 0.87)
TCP_STREAM thread-48 1.00 ( 0.29) -1.55 ( 0.14)
TCP_STREAM thread-72 1.00 ( 0.05) -1.59 ( 0.05)
TCP_STREAM thread-96 1.00 ( 0.19) -0.06 ( 0.29)
TCP_STREAM thread-192 1.00 ( 0.23) -0.01 ( 0.28)
UDP_RR thread-24 1.00 ( 2.27) +0.33 ( 2.82)
UDP_RR thread-48 1.00 ( 1.25) -0.30 ( 1.21)
UDP_RR thread-72 1.00 ( 2.54) +2.99 ( 2.34)
UDP_RR thread-96 1.00 ( 4.76) +2.49 ( 2.19)
UDP_RR thread-192 1.00 ( 14.43) -0.02 ( 12.98)
UDP_STREAM thread-24 1.00 (107.41) -0.48 (106.93)
UDP_STREAM thread-48 1.00 (100.85) +1.38 (100.59)
UDP_STREAM thread-72 1.00 (103.43) +1.40 (103.48)
UDP_STREAM thread-96 1.00 ( 99.91) -0.25 (100.06)
UDP_STREAM thread-192 1.00 (109.83) -3.67 (104.12)

As patch 3 moves forward traversal of cgroup hierarchy for pressure-aware
protocols, which could turn a conditional overhead into constant, tests
running inside 5-level-depth cgroups are also performed.

case load baseline(std%) compare%( std%)
tbench-loopback thread-24 1.00 ( 0.59) +0.68 ( 0.09)
tbench-loopback thread-48 1.00 ( 0.16) +0.01 ( 0.26)
tbench-loopback thread-72 1.00 ( 0.34) -0.67 ( 0.48)
tbench-loopback thread-96 1.00 ( 4.40) -3.27 ( 4.84)
tbench-loopback thread-192 1.00 ( 0.49) -1.07 ( 1.18)
TCP_RR thread-24 1.00 ( 2.40) -0.34 ( 2.49)
TCP_RR thread-48 1.00 ( 1.62) -0.48 ( 1.35)
TCP_RR thread-72 1.00 ( 1.26) +0.46 ( 0.95)
TCP_RR thread-96 1.00 ( 2.98) +0.13 ( 2.64)
TCP_RR thread-192 1.00 ( 13.75) -0.20 ( 15.42)
TCP_STREAM thread-24 1.00 ( 0.21) +0.68 ( 1.02)
TCP_STREAM thread-48 1.00 ( 0.20) -1.41 ( 0.01)
TCP_STREAM thread-72 1.00 ( 0.09) -1.23 ( 0.19)
TCP_STREAM thread-96 1.00 ( 0.01) +0.01 ( 0.01)
TCP_STREAM thread-192 1.00 ( 0.20) -0.02 ( 0.25)
UDP_RR thread-24 1.00 ( 2.20) +0.84 ( 17.45)
UDP_RR thread-48 1.00 ( 1.34) -0.73 ( 1.12)
UDP_RR thread-72 1.00 ( 2.32) +0.49 ( 2.11)
UDP_RR thread-96 1.00 ( 2.36) +0.53 ( 2.42)
UDP_RR thread-192 1.00 ( 16.34) -0.67 ( 14.06)
UDP_STREAM thread-24 1.00 (106.55) -0.70 (107.13)
UDP_STREAM thread-48 1.00 (105.11) +1.60 (103.48)
UDP_STREAM thread-72 1.00 (100.60) +1.98 (101.13)
UDP_STREAM thread-96 1.00 ( 99.91) +2.59 (101.04)
UDP_STREAM thread-192 1.00 (135.39) -2.51 (108.00)

As expected, no obvious performance gain or loss observed. As for the
issue we encountered, this patchset provides better worst-case behavior
that such OOM cases are reduced at some extent. While further fine-
grained traffic control is what the workloads need to think about.

Comments are welcomed! Thanks!

Abel Wu (3):
sock: Code cleanup on __sk_mem_raise_allocated()
net-memcg: Record pressure level when under pressure
sock: Throttle pressure-aware sockets under pressure

include/linux/memcontrol.h | 39 +++++++++++++++++++++++++----
include/net/sock.h | 2 +-
include/net/tcp.h | 2 +-
mm/vmpressure.c | 9 ++++++-
net/core/sock.c | 51 +++++++++++++++++++++++++++++---------
5 files changed, 83 insertions(+), 20 deletions(-)

--
2.37.3