Re: 回复: [PATCH] pid: add handling of too many zombie processes

From: Michal Hocko
Date: Mon Feb 13 2023 - 08:25:50 EST


On Thu 09-02-23 15:14:57, huyd12@xxxxxxxxxxxxxxx wrote:
>
> Any comments will be appreciated.
>
>
>
> -----邮件原件-----
> 发件人: liuq131@xxxxxxxxxxxxxxx <liuq131@xxxxxxxxxxxxxxx>
> 发送时间: 2023年2月8日 17:49
> 收件人: akpm@xxxxxxxxxxxxxxxxxxxx
> 抄送: agruenba@xxxxxxxxxx; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> huyd12@xxxxxxxxxxxxxxx; liuq <liuq131@xxxxxxxxxxxxxxx>
> 主题: [PATCH] pid: add handling of too many zombie processes
>
> There is a common situation that a parent process forks many child processes
> to execute tasks, but the parent process does not execute wait/waitpid when
> the child process exits, resulting in a large number of child processes
> becoming zombie processes.
>
> At this time, if the number of processes in the system out of
> kernel.pid_max, the new fork syscall will fail, and the system will not be
> able to execute any command at this time (unless an old process exits)
>
> eg:
> [root@lq-workstation ~]# ls
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: Resource temporarily unavailable [root@lq-workstation ~]#
> reboot
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: retry: Resource temporarily unavailable
> -bash: fork: Resource temporarily unavailable
>
> I dealt with this situation in the alloc_pid function, and found a process
> with the most zombie subprocesses, and more than 10(or other reasonable
> values?) zombie subprocesses, so I tried to kill this process to release the
> pid resources.

Abusing oom_kill_process is not the right approach. Also any hard coded limit
fir the number of zombies can turn out to be really tricky and it can
cause regressions.

Is there any reason you cannot contain those misbehaving workloads in a
pid controller?
--
Michal Hocko
SUSE Labs