[patch] fix spinlock-debug looping

From: Ingo Molnar
Date: Tue Jun 20 2006 - 04:43:54 EST



* Andrew Morton <akpm@xxxxxxxx> wrote:

> > > > The write_trylock + __delay in the loop is not a problem or a bug, as
> > > > the trylock will at most _increase_ the delay - and our goal is to not
> > > > have a false positive, not to be absolutely accurate about the
> > > > measurement here.
> > >
> > > Precisely. We have delays of over a second (but we don't know how
> > > much more than a second). Let's say two seconds. The NMI watchdog
> > > timeout is, what? Five seconds?
> >
> > i dont see the problem.
>
> It's taking over a second to acquire a write_lock. A lock which is
> unlikely to be held for more than a microsecond anywhere. That's
> really bad, isn't it? Being on the edge of an NMI watchdog induced
> system crash is bad, too.

i obviously agree that any such crash is a serious problem, but is it
caused by the spinlock-debugging code? I doubt it is, unless __delay()
is seriously buggered.

in any case, to move this problem forward i'd suggest we go with the
patch below in -mm and wait for feedback. It fixes a potential overflow
in the comparison (if HZ*lpj overflows 32-bits) That needs a really fast
box to run the 32-bit kernel though, so i doubt this is the cause of the
problems. In any case, this change makes it easier to increase the
looping timeout from 1 second to 10 seconds later on or so - at which
point the overflow can happen for real and must be handled .

Ingo

------------
Subject: fix spinlock-debug looping
From: Ingo Molnar <mingo@xxxxxxx>

make sure the right hand side of the comparison does not overflow
on 32-bits. Also print out more info when detecting a lockup, so
that we see how many times the code tried (and failed) to get the
lock.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
lib/spinlock_debug.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

Index: linux/lib/spinlock_debug.c
===================================================================
--- linux.orig/lib/spinlock_debug.c
+++ linux/lib/spinlock_debug.c
@@ -104,7 +104,7 @@ static void __spin_lock_debug(spinlock_t
u64 i;

for (;;) {
- for (i = 0; i < loops_per_jiffy * HZ; i++) {
+ for (i = 0; i < (u64)loops_per_jiffy * HZ; i++) {
if (__raw_spin_trylock(&lock->raw_lock))
return;
__delay(1);
@@ -112,10 +112,10 @@ static void __spin_lock_debug(spinlock_t
/* lockup suspected: */
if (print_once) {
print_once = 0;
- printk(KERN_EMERG "BUG: spinlock lockup on CPU#%d, "
- "%s/%d, %p\n",
+ printk(KERN_EMERG "BUG: possible spinlock lockup on CPU#%d, "
+ "%s/%d, %p [%Ld/%ld]\n",
raw_smp_processor_id(), current->comm,
- current->pid, lock);
+ current->pid, lock, i, loops_per_jiffy);
dump_stack();
}
}
@@ -169,7 +169,7 @@ static void __read_lock_debug(rwlock_t *
u64 i;

for (;;) {
- for (i = 0; i < loops_per_jiffy * HZ; i++) {
+ for (i = 0; i < (u64)loops_per_jiffy * HZ; i++) {
if (__raw_read_trylock(&lock->raw_lock))
return;
__delay(1);
@@ -177,10 +177,10 @@ static void __read_lock_debug(rwlock_t *
/* lockup suspected: */
if (print_once) {
print_once = 0;
- printk(KERN_EMERG "BUG: read-lock lockup on CPU#%d, "
- "%s/%d, %p\n",
+ printk(KERN_EMERG "BUG: possible read-lock lockup on CPU#%d, "
+ "%s/%d, %p [%Ld/%ld]\n",
raw_smp_processor_id(), current->comm,
- current->pid, lock);
+ current->pid, lock, i, loops_per_jiffy);
dump_stack();
}
}
@@ -242,7 +242,7 @@ static void __write_lock_debug(rwlock_t
u64 i;

for (;;) {
- for (i = 0; i < loops_per_jiffy * HZ; i++) {
+ for (i = 0; i < (u64)loops_per_jiffy * HZ; i++) {
if (__raw_write_trylock(&lock->raw_lock))
return;
__delay(1);
@@ -250,10 +250,10 @@ static void __write_lock_debug(rwlock_t
/* lockup suspected: */
if (print_once) {
print_once = 0;
- printk(KERN_EMERG "BUG: write-lock lockup on CPU#%d, "
- "%s/%d, %p\n",
+ printk(KERN_EMERG "BUG: possible write-lock lockup on CPU#%d, "
+ "%s/%d, %p [%Ld/%ld]\n",
raw_smp_processor_id(), current->comm,
- current->pid, lock);
+ current->pid, lock, i, loops_per_jiffy);
dump_stack();
}
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/