Re: [PATCH] arp_queue: serializing unlink + kfree_skb

From: Herbert Xu
Date: Thu Feb 03 2005 - 06:29:18 EST


On Wed, Feb 02, 2005 at 04:20:23PM -0800, David S. Miller wrote:
>
> > if (atomic_read(&skb->users) != 1) {
> > smp_mb__before_atomic_dec();
> > if (!atomic_dec_and_test(&skb->users))
> > return;
> > }
> > __kfree_skb(skb);
>
> This looks good. Olaf can you possibly ask the reproducer if
> this patch makes the ARP problem go away?

Not so hasty Dave :)

That was only meant to be an example of what we might do to
insert a write barrier in kfree_skb().

It doesn't solve Olaf's problem because a write barrier is
usually not sufficient by itself. In most cases you'll need
a read barrier as well.

So if we're going for a solution that only involves kfree_skb()
then we'll need the following patch.

I got rid of the atomic_read optimisation since it would've needed
an additional smp_rmb() to be safe.

I thought about preserving that optimisation through something
like skb->cloned. However, it turns out that most skb's will be
shared at least once so it doesn't buy us much.

Signed-off-by: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>

What I hoped to do though was to bring your collective attention
to the problem in general. kfree_skb() is certainly not the only
place where we free an object after the counter hits zero.

This paradigm is repeated throughout the kernel. I bet the
same race can be found in a lot of those places. So we really
need to sit down and audit them one by one or else come up with
a magical solution apart from disabling SMP :)

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@xxxxxxxxxxxxxxxxxxx>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
===== include/linux/skbuff.h 1.59 vs edited =====
--- 1.59/include/linux/skbuff.h 2005-01-11 07:23:55 +11:00
+++ edited/include/linux/skbuff.h 2005-02-03 21:57:26 +11:00
@@ -353,15 +353,21 @@
*/
static inline void kfree_skb(struct sk_buff *skb)
{
- if (atomic_read(&skb->users) == 1 || atomic_dec_and_test(&skb->users))
- __kfree_skb(skb);
+ smp_mb__before_atomic_dec();
+ if (!atomic_dec_and_test(&skb->users))
+ return;
+ smp_mb__after_atomic_dec();
+ __kfree_skb(skb);
}

/* Use this if you didn't touch the skb state [for fast switching] */
static inline void kfree_skb_fast(struct sk_buff *skb)
{
- if (atomic_read(&skb->users) == 1 || atomic_dec_and_test(&skb->users))
- kfree_skbmem(skb);
+ smp_mb__before_atomic_dec();
+ if (!atomic_dec_and_test(&skb->users))
+ return;
+ smp_mb__after_atomic_dec();
+ kfree_skbmem(skb);
}

/**