[PATCH] fix livelock in non atomic brlocks

From: Anton Blanchard (anton@samba.org)
Date: Wed Nov 14 2001 - 23:03:03 EST


Hi,

One characteristic of brlocks is that you must be able to take recursive
read locks (sparc64, ppc64 interrupt handling and netfilter make use of
this property). On a 16 cpu ppc64 machine I have seen the following
scenario:

cpu 1 (in br_write_lock)
again:
        spin_lock(&__br_write_locks[idx].lock);
                /* discover reader is present so backoff */
        spin_unlock(&__br_write_locks[idx].lock);
        goto again;

cpu 2 (in br_read_lock)
again:
        read_count++;
        mb();
        /* discover __br_write_locks[idx].lock is locked */
        read_count--;
        wmb();
        goto again;

If the read retry path is much slower than the write retry path (in this
case mb(); read_count--; wmb(); is quite slow), then both cpus will spin
forever and get nowhere.

We need to make sure the write retry path backs off long enough to
ensure forward progress. A udelay(1) is a rather large hammer but a
br_write_lock is known to be very slow and should not be called much.

Anton

--- 2.4.15-pre4/lib/brlock.c Thu Nov 15 13:38:04 2001
+++ linuxppc64_2_4_rochester/lib/brlock.c Thu Nov 15 13:33:35 2001
@@ -14,6 +14,7 @@
 
 #include <linux/sched.h>
 #include <linux/brlock.h>
+#include <linux/delay.h>
 
 #ifdef __BRLOCK_USE_ATOMICS
 
@@ -54,7 +55,8 @@
                 if (__brlock_array[cpu_logical_map(i)][idx] != 0) {
                         spin_unlock(&__br_write_locks[idx].lock);
                         barrier();
- cpu_relax();
+ /* We must allow recursive readers to make progress */
+ udelay(1);
                         goto again;
                 }
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Nov 15 2001 - 21:00:41 EST