Re: Recent 2.3.x kernels hang on startup

Keith Owens (kaos@ocs.com.au)
Fri, 31 Dec 1999 16:14:07 +1100


On Thu, 30 Dec 1999 23:30:52 -0500,
"Jeff Millar" <jeff@wa1hco.mv.com> wrote:
>Lately, kernels in the range 2.3.25 - 2.3.35 hang on startup or start
>strangly
>while 2.2.13 works fine.
>
>The symptoms include
>
> hanging after the "starting init" message, but alt/sysrq keys work
> a stream of "VM killing process modprobe" messages
> one "VM kill process modprobe message" and then hang
>
>The later kernels hang hard, but ~2.3.31 put out the stream of messages

(1) Compile with Unix sockets in the kernel instead of as a module or

(2) Apply this patch which limits the problem. It has been sent to
Linus 3 or 4 times without response.

Against 2.3.35, stops runaway loops in kmod caused by "modprobe needs a
service that is in a module".

Index: 35.4/Documentation/kmod.txt
--- 35.4/Documentation/kmod.txt Fri, 14 May 1999 15:55:23 +1000 keith (linux-2.3/40_kmod.txt 1.1 644)
+++ 35.4(w)/Documentation/kmod.txt Wed, 29 Dec 1999 22:39:52 +1100 keith (linux-2.3/40_kmod.txt 1.2 644)
@@ -45,3 +45,24 @@

- kmod reports errors through the normal kernel mechanisms, which avoids
the chicken and egg problem of kerneld and modular Unix domain sockets
+
+
+Keith Owens <kaos@ocs.com.au> December 1999
+
+The combination of kmod and modprobe can loop, especially if modprobe uses a
+system call that requires a module. If modules.dep does not exist and modprobe
+was started with the -s option (kmod does this), modprobe tries to syslog() a
+message. syslog() needs Unix sockets, if Unix sockets are modular then kmod
+runs "modprobe -s net-pf-1". This runs a second copy of modprobe which
+complains that modules.dep does not exist, tries to use syslog() and starts yet
+another copy of modprobe. This is not the only possible kmod/modprobe loop,
+just the most common.
+
+To detect loops caused by "modprobe needs a service which is in a module", kmod
+limits the number of concurrent kmod issued modprobes. See MAX_KMOD_CONCURRENT
+in kernel/kmod.c. When this limit is exceeded, the kernel issues message "kmod:
+runaway modprobe loop assumed and stopped".
+
+Note for users building a heavily modularised system. It is a good idea to
+create modules.dep after installing the modules and before booting a kernel for
+the first time. "depmod -ae m.n.p" where m.n.p is the new kernel version.
Index: 35.4/kernel/kmod.c
--- 35.4/kernel/kmod.c Fri, 12 Nov 1999 18:53:00 +1100 keith (linux-2.3/F/b/30_kmod.c 1.2 644)
+++ 35.4(w)/kernel/kmod.c Wed, 29 Dec 1999 22:39:55 +1100 keith (linux-2.3/F/b/30_kmod.c 1.3 644)
@@ -7,6 +7,10 @@

Modified to avoid chroot and file sharing problems.
Mikael Pettersson
+
+ Limit the concurrent number of kmod modprobes to catch loops from
+ "modprobe needs a service that is in a module".
+ Keith Owens <kaos@ocs.com.au> December 1999
*/

#define __KERNEL_SYSCALLS__
@@ -22,6 +26,8 @@
*/
char modprobe_path[256] = "/sbin/modprobe";

+extern int max_threads;
+
static inline void
use_init_fs_context(void)
{
@@ -113,6 +119,10 @@
int pid;
int waitpid_result;
sigset_t tmpsig;
+ int i;
+ static atomic_t kmod_concurrent = ATOMIC_INIT(0);
+#define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
+ static int kmod_loop_msg;

/* Don't allow request_module() before the root fs is mounted! */
if ( ! current->fs->root ) {
@@ -121,9 +131,31 @@
return -EPERM;
}

+ /* If modprobe needs a service that is in a module, we get a recursive
+ * loop. Limit the number of running kmod threads to max_threads/2 or
+ * MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
+ * would be to run the parents of this process, counting how many times
+ * kmod was invoked. That would mean accessing the internals of the
+ * process tables to get the command line, proc_pid_cmdline is static
+ * and it is not worth changing the proc code just to handle this case.
+ * KAO.
+ */
+ i = max_threads/2;
+ if (i > MAX_KMOD_CONCURRENT)
+ i = MAX_KMOD_CONCURRENT;
+ atomic_inc(&kmod_concurrent);
+ if (atomic_read(&kmod_concurrent) > i) {
+ if (kmod_loop_msg++ < 5)
+ printk(KERN_ERR
+ "kmod: runaway modprobe loop assumed and stopped\n");
+ atomic_dec(&kmod_concurrent);
+ return -ENOMEM;
+ }
+
pid = kernel_thread(exec_modprobe, (void*) module_name, 0);
if (pid < 0) {
printk(KERN_ERR "request_module[%s]: fork failed, errno %d\n", module_name, -pid);
+ atomic_dec(&kmod_concurrent);
return pid;
}

@@ -135,6 +167,7 @@
spin_unlock_irq(&current->sigmask_lock);

waitpid_result = waitpid(pid, NULL, __WCLONE);
+ atomic_dec(&kmod_concurrent);

/* Allow signals again.. */
spin_lock_irq(&current->sigmask_lock);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/