[RFC] Suspicious bug in module refcounting

From: Karsten Keil
Date: Tue Feb 03 2009 - 08:47:35 EST


Hi,

While debugging a wired SCTP bug we hit from time to time
the BUG_ON(module_refcount(module) == 0) statement in __module_get().
After fixing the SCTP bug in final tests runs with lot of traffic
we still saw this bug message from time to time.
Looking at the refcounting in sctp, does not show up any forgotten
get module operation or some wrong put module calls.
We added some debug code and this shows only, that the test case heavly
change module_refcount(sctp) bacause the socket accept call also increase
the module count (and the BUG always was triggered in the __module_get() here).
If the socket close, it decrease the module refcount, so no problem here.
Some times the refcount seems to go very high, very quickly and later
goes down very quickly again, I think this occurs if the network stalls for
some time (the test case try to saturate a GB networklink).
So I had the idea that the bug is in the refcounting itself and not related
to the sctp code.
After looking closer at __module_get() and module_refcount(module) it looks
as I'm right, but I can not belive that this bug was not discovered before,
the code is here since long time.
The refcount is a per CPU atomic variable, module_refcount() simple add
in a fully unprotected loop (not disabled irqs, not protected against
scheduling) all per cpu values.

The issue is that so the process (it is as syscall from a userspace process)
get scheduled in the middle of the counting loop, so it already has counted
some per cpu vales, but not all.
Now while the process is not active, other processes modify the counts, some
accepts (== module gets) increase the count on already summed up CPUs,
some other release sockets on CPUs, which are not already summed up.
If the process become active again, it read the now decremented values from
later CPUS, so the total count is too low and may reach zero, in which the above
BUG_ON() will be triggered.

To prove this I replaced the BUG_ON with following code:

if (module) {
- BUG_ON(module_refcount(module) == 0);
+ unsigned int c = module_refcount(module);
+
+ if (unlikely(c == 0)) {
+ printk(KERN_ERR" module %s refcount=%x/%x\n", module->name, c, module_refcount(module));
+ module_dump_refcounts(module);
+ WARN_ON(1);
+ }
local_inc(&module->ref[get_cpu()].count);
put_cpu();
}

module_dump_refcounts() does print out the per cpu refcounts.
If my idea is correct the second call to module_refcount(module) should
have a none zero value.

And indeed:

Feb 2 03:29:51 pingi5 kernel: module sctp refcount=0/8e
Feb 2 03:29:51 pingi5 kernel: CPU0 refcount:38562331
Feb 2 03:29:51 pingi5 kernel: CPU1 refcount:-38562187
Feb 2 03:29:51 pingi5 kernel: Badness in __module_get at include/linux/module.h:374
Feb 2 03:29:51 pingi5 kernel:
Feb 2 03:29:51 pingi5 kernel: Call Trace: <ffffffff802837c2>{sys_accept+212} <ffffffff8019c76f>{dput+44}
Feb 2 03:29:51 pingi5 kernel: <ffffffff8018704f>{__fput+355} <ffffffff801a0dff>{mntput_no_expire+29}
Feb 2 03:29:51 pingi5 kernel: <ffffffff801845df>{filp_close+92} <ffffffff8010adba>{system_call+126}


Do my findings be correct, or do I miss something ?

A other thing is, why __module_get() should be used anyway, I think it was a
optimation long time ago, current code seems to need more cycles on default
SMP kernels as try_module_get(), because of the big loop in module_refcount()
which goes trough all possible CPUs (NR_CPU, 64 in default config).
try_module_get() only has one test and one atomic increment.


I think we should replace all unprotected __module_get() calls with
try_module_get(), or remove __module_get() completely.


Any comments ?


--
Karsten Keil
SuSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/