FYI: PoC: Running 100000 processes in 5.3.18 (SLES15 SP2)

From: Ulrich Windl
Date: Fri Oct 02 2020 - 06:06:13 EST


Hi!

Just in case someone is interested: As a Proof-of-Concept I started 100 thousand processes on a big machine (72 cores). It worked!
However starting those too more than 30 minutes, and top needs more than 30 minutes to refresh ist display. Still, interactive input via SSH works nice, but any file-system access seems quite slow (my test processes just use CPU; the do no t do any I/O).

Kernel messages while the processes were created:
kernel: [65648.247688] perf: interrupt took too long (2516 > 2500), lowering kernel.perf_event_max_sample_rate to 79250
kernel: [65997.263218] perf: interrupt took too long (3146 > 3145), lowering kernel.perf_event_max_sample_rate to 63500
kernel: [66790.221057] perf: interrupt took too long (3938 > 3932), lowering kernel.perf_event_max_sample_rate to 50750
kernel: [69884.371426] perf: interrupt took too long (4925 > 4922), lowering kernel.perf_event_max_sample_rate to 40500

Last top output (more than 30 late):
top - 11:16:19 up 19:19, 3 users, load average: 64164.56, 62997.24, 55597.09
Tasks: 101432 total, 60249 running, 41183 sleeping, 0 stopped, 0 zombie
%Cpu(s): 98.0 us, 2.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 772127.6+total, 755924.2+free, 14253.01+used, 1950.363 buff/cache
MiB Swap: 773120.0+total, 772958.1+free, 161.816 used. 754248.8+avail Mem
...

That's a load, isn't it? ;-)

# cat /proc/uptime
72084.21 9356423.41
# cat /proc/loadavg
64188.31 64188.81 63636.08 64228/102328 134935

Regards,
Ulrich