Re: Lockup with 2.6.9-ac15 related to netconsole

From: Mark Broadbent
Date: Wed Dec 22 2004 - 09:39:49 EST



Patrick McHardy said:
> Patrick McHardy wrote:
>> Francois Romieu wrote:
>>
>>> Marc, if the patch below happens to work, it should not drop messages
>>> like the previous one (it is an ugly short-term suggestion).
>>>
>>
>>> - spin_lock(&np->dev->xmit_lock);
>>> + while (!spin_trylock(&np->dev->xmit_lock)) {
>>> + if (np->dev->xmit_lock_owner == smp_processor_id()) {
>>> + struct Qdisc *q = dev->qdisc;
>>> +
>>> + q->ops->enqueue(skb, q);
>>
>>
>> Shouldn't this be requeue ?
>
> Since the code doesn't dequeue itself, enqueue seems fine to keep
> at least the queued messages ordered. But you need to grab
> dev->queue_lock, otherwise you risk corrupting qdisc internal data. You
> should probably also deal with the noqueue-qdisc, which doesn't have an
> enqueue function. So it should look something like this:
>
> while (!spin_trylock(&np->dev->xmit_lock)) {
> if (np->dev->xmit_lock_owner == smp_processor_id()) {
> struct Qdisc *q;
>
> rcu_read_lock();
> q = rcu_dereference(dev->qdisc);
> if (q->enqueue) {
> spin_lock(&dev->queue_lock);
> q->ops->enqueue(skb, q);
> spin_unlock(&dev->queue_lock);
> netif_schedule(np->dev);
> } else
> kfree_skb(skb);
> rcu_read_unlock();
> return;
> }
> }

I've tried both patches (modified slightly to get them to compile) but
they both produce hard NMI detected lockups (as before).
Thanks
Mark

Patches after modification to allow compilation:

Francois' patch (against 2.6.10-rc3-bk10):

diff -X dontdiff -urN linux-2.6.9-rc3-bk10.orig/net/core/netpoll.c
linux-2.6.9-rc3-bk10/net/core/netpoll.c--- linux-2.6.9-rc3-bk10.orig/net/core/netpoll.c 2004-12-22
12:09:32.000000000 +0000+++ linux-2.6.9-rc3-bk10/net/core/netpoll.c 2004-12-22 14:13:54.000000000
+0000@@ -22,6 +22,7 @@
#include <net/tcp.h>
#include <net/udp.h>
#include <asm/unaligned.h>
+#include <net/pkt_sched.h>

/*
* We maintain a small pool of fully-sized skbs, to make sure the
@@ -188,7 +189,15 @@
return;
}

- spin_lock(&np->dev->xmit_lock);
+ while (!spin_trylock(&np->dev->xmit_lock)) {
+ if (np->dev->xmit_lock_owner == smp_processor_id()) {
+ struct Qdisc *q = np->dev->qdisc;
+
+ q->ops->enqueue(skb, q);
+ return;
+ }
+ }
+
np->dev->xmit_lock_owner = smp_processor_id();

/*

Patrick's patch (against 2.6.10-rc3-bk10):

diff -X dontdiff -urN linux-2.6.9-rc3-bk10.orig/net/core/netpoll.c
linux-2.6.9-rc3-bk10/net/core/netpoll.c--- linux-2.6.9-rc3-bk10.orig/net/core/netpoll.c 2004-12-22
12:09:32.000000000 +0000+++ linux-2.6.9-rc3-bk10/net/core/netpoll.c 2004-12-22 11:08:06.000000000
+0000@@ -22,6 +22,7 @@
#include <net/tcp.h>
#include <net/udp.h>
#include <asm/unaligned.h>
+#include <net/pkt_sched.h>

/*
* We maintain a small pool of fully-sized skbs, to make sure the
@@ -188,7 +189,24 @@
return;
}

- spin_lock(&np->dev->xmit_lock);
+ while (!spin_trylock(&np->dev->xmit_lock)) {
+ if (np->dev->xmit_lock_owner == smp_processor_id()) {
+ struct Qdisc *q;
+
+ rcu_read_lock();
+ q = rcu_dereference(np->dev->qdisc);
+ if (q->enqueue) {
+ spin_lock(&np->dev->queue_lock);
+ q->ops->enqueue(skb, q);
+ spin_unlock(&np->dev->queue_lock);
+ netif_schedule(np->dev);
+ } else
+ __kfree_skb(skb);
+ rcu_read_unlock();
+ return;
+ }
+ }
+
np->dev->xmit_lock_owner = smp_processor_id();

/*


--
Mark Broadbent <markb@xxxxxxxxxxxxxx>
Web: http://www.wetlettuce.com



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/