Re: [RFC 2/5] workqueue: Warn when a new worker could not be created

From: Michal Koutný
Date: Wed Feb 15 2023 - 13:02:12 EST


Hello.

On Wed, Feb 01, 2023 at 02:45:40PM +0100, Petr Mladek <pmladek@xxxxxxxx> wrote:
> + the system reached PID limit
or threads-max limit.

FTR, I was once considering something like

--->8---
diff --git a/kernel/fork.c b/kernel/fork.c
index 867b46d6fd0a..bba05ecc3765 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1684,8 +1684,10 @@ static __latent_entropy struct task_struct *copy_process(
* to stop root fork bombs.
*/
retval = -EAGAIN;
- if (nr_threads >= max_threads)
+ if (nr_threads >= max_threads) {
+ printk_once(KERN_INFO "clone failed due to threads-max limit\n");
goto bad_fork_cleanup_count;
+ }

delayacct_tsk_init(p); /* Must remain after dup_task_struct() */
p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE);
@@ -1816,6 +1818,7 @@ static __latent_entropy struct task_struct *copy_process(
if (pid != &init_struct_pid) {
pid = alloc_pid(p->nsproxy->pid_ns_for_children);
if (IS_ERR(pid)) {
+ printk_once(KERN_INFO "fork failed to find pid\n");
retval = PTR_ERR(pid);
goto bad_fork_cleanup_thread;
}
--->8---

Effects of the global limits on anything but kthreads should be much less
important and easier to troubleshoot anyway.
Covering kworkers with your changes should be useful and substitute my idea
above. Take that as my support for this patch (from my perspective reporting
*_once would be enough to guide a troubleshooter).

Thanks,
Michal


Attachment: signature.asc
Description: Digital signature