[patch] race-fix for bottom-half-functions [Re: Subtle trouble in remove_bh().]

Andrea Arcangeli (andrea@e-mind.com)
Fri, 15 Jan 1999 00:59:45 +0100 (CET)


On Thu, 14 Jan 1999, Patrik Rak wrote:

> When we are already discussing the softirq.h issues here, there is another
> subtle bug in remove_bh(). This one affects all boxes.
>
> The problem is that this function first clears the handler pointer and
> than clears the mask bit - if an int would occur meanwhile, it would crash
> in run_bottom_halves().

I just seen that some weeks ago. The reason I have not fixed it at that
time is because I thought it's supposed to be used only in single threaded
code paths. Basically it's _only_ used in the cleanup_module or other
obviosly single threaded code.

And I don't really worry about that bug because it's _far_ to be subtle.
You'll see a nice Oops that tell you exactly what happened without
mistakes, and in some seconds you'll fix the bug in the driver.

> jj claims that all remove_bh() calls are protected by global cli, but I
> think there is one usage in drivers/scsi/scsi.c which is not...

That's not a big problem because it's in the cleanup_module(). So it's far
to harm during real production. Having a race when doing a cleanup_module
is not the worse thing we could have ;).

> I would suggest to change the code so it first clears the mask bit - it
> costs nothing and the function does not need the global cli anymore.

Agreed, it will make the code more robust with a tiny overhead. I repost
here the _whole_ fix for the bottom-half-function SMP races. I think this
should go in fast to avoid maintainers of other archs to fix things two
times ;)

Index: linux/kernel/softirq.c
diff -u linux/kernel/softirq.c:1.1.1.3 linux/kernel/softirq.c:1.1.1.1.2.5
--- linux/kernel/softirq.c:1.1.1.3 Wed Dec 23 15:25:17 1998
+++ linux/kernel/softirq.c Mon Dec 28 03:26:29 1998
@@ -20,7 +20,8 @@

/* intr_count died a painless death... -DaveM */

-atomic_t bh_mask_count[32];
+spinlock_t bh_lock = SPIN_LOCK_UNLOCKED;
+int bh_mask_count[32];
unsigned long bh_active = 0;
unsigned long bh_mask = 0;
void (*bh_base[32])(void);
Index: linux/include/asm-i386/softirq.h
diff -u linux/include/asm-i386/softirq.h:1.1.1.2 linux/include/asm-i386/softirq.h:1.1.1.1.2.6
--- linux/include/asm-i386/softirq.h:1.1.1.2 Wed Dec 23 15:22:50 1998
+++ linux/include/asm-i386/softirq.h Fri Jan 15 00:54:35 1999
@@ -12,14 +12,16 @@
extern inline void init_bh(int nr, void (*routine)(void))
{
bh_base[nr] = routine;
- atomic_set(&bh_mask_count[nr], 0);
+ bh_mask_count[nr] = 0;
+ wmb();
bh_mask |= 1 << nr;
}

extern inline void remove_bh(int nr)
{
- bh_base[nr] = NULL;
bh_mask &= ~(1 << nr);
+ wmb();
+ bh_base[nr] = NULL;
}

extern inline void mark_bh(int nr)
@@ -96,15 +98,23 @@
*/
extern inline void disable_bh(int nr)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&bh_lock, flags);
bh_mask &= ~(1 << nr);
- atomic_inc(&bh_mask_count[nr]);
+ bh_mask_count[nr]++;
+ spin_unlock_irqrestore(&bh_lock, flags);
synchronize_bh();
}

extern inline void enable_bh(int nr)
{
- if (atomic_dec_and_test(&bh_mask_count[nr]))
+ unsigned long flags;
+
+ spin_lock_irqsave(&bh_lock, flags);
+ if (!--bh_mask_count[nr])
bh_mask |= 1 << nr;
+ spin_unlock_irqrestore(&bh_lock, flags);
}

#endif /* __ASM_SOFTIRQ_H */
Index: linux/include/linux/interrupt.h
diff -u linux/include/linux/interrupt.h:1.1.1.3 linux/include/linux/interrupt.h:1.1.1.1.2.6
--- linux/include/linux/interrupt.h:1.1.1.3 Wed Dec 23 15:24:21 1998
+++ linux/include/linux/interrupt.h Mon Dec 28 03:26:27 1998
@@ -17,7 +17,8 @@

extern volatile unsigned char bh_running;

-extern atomic_t bh_mask_count[32];
+extern spinlock_t bh_lock;
+extern int bh_mask_count[32];
extern unsigned long bh_active;
extern unsigned long bh_mask;
extern void (*bh_base[32])(void);

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/