Re: nfsroot on multiple-e1000e serial-over-LAN system -> deadlock?

From: Nix
Date: Wed May 20 2009 - 18:27:59 EST


(e1000-devel, this is with an 82574L in 100Mb/s mode and upstream git
up-to-date as of a couple of days ago. Your driver works, modulo a small
patch and some unpleasant screaming in the log on boot: the in-tree one
doesn't work.)

On 19 May 2009, nix@xxxxxxxxxxxxx uttered the following:
> But then I come to a machine with multiple NICs and IPMI, and things
> fall over. I have to manually specify the NIC to use or it goes into a
> DHCP-probing deadlock (cause undiagnosed but it looks identical to this
> one so may be identical): but if I give the NIC info by hand, I *still*
> see a deadlock:
>
> [ 89.613880] IP-Config: Complete:
> [ 89.616943] device=eth0, addr=192.168.14.15, mask=255.255.255.0, gw=192.168.14.1,
> [ 89.624921] host=spindle, domain=, nis-domain=(none),
> [ 89.630430] bootserver=192.168.14.18, rootserver=192.168.14.18, rootpath=
> [ 90.333195] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
> [ 90.340668] 0000:03:00.0: eth0: 10/100 speed: disabling TSO
> [ 325.182384] INFO: task swapper:1 blocked for more than 120 seconds.
> [ 325.188653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 325.196473] swapper D 00000014 0 1 0
> [ 325.201766] f7061eec 00000046 dd66aa4a 00000014 00000000 00000000 00000000 c05d1480
> [ 325.209749] c05d1480 00000000 00000000 f705ec40 f705eed4 c2805480 00000000 ded7f8e3
> [ 325.217743] 00000014 00000000 c0548160 00000000 00000000 00000000 00000000 f705eed4
> [ 325.225742] Call Trace:
> [ 325.228202] [<c0408ebc>] schedule+0x8/0x17
> [ 325.232391] [<c0408fa6>] schedule_timeout+0x17/0x164
> [ 325.237454] [<c01346d1>] ? __wake_up+0x31/0x3b
> [ 325.241987] [<c040844e>] wait_for_common+0xaa/0xfc
> [ 325.246872] [<c013ae99>] ? default_wake_function+0x0/0xd
> [ 325.252271] [<c0408512>] wait_for_completion+0x12/0x14
> [ 325.257498] [<c014d003>] flush_cpu_workqueue+0x59/0x62
> [ 325.262720] [<c014ced7>] ? wq_barrier_func+0x0/0xd
> [ 325.267605] [<c014d177>] flush_workqueue+0x2b/0x49
> [ 325.272485] [<c014d1a2>] flush_scheduled_work+0xd/0xf
> [ 325.277626] [<c0585578>] kernel_init+0x10e/0x152
> [ 325.282340] [<c058546a>] ? kernel_init+0x0/0x152
> [ 325.287045] [<c011d8cf>] kernel_thread_helper+0x7/0x10
>
> Its cause is unclear.

sysrq-t suggests a cause:

[ 257.002484] ksoftirqd/3 R running 0 13 2
[ 257.007778] 00000000 00000000 00000040 f70aff8c f683205c f62d04c4 f62d03c0 00000040
[ 257.015744] 00000000 f70aff68 c0317c79 00000246 f62d04c4 f62d03c0 00000040 f62d04c4
[ 257.023704] 00000040 00000000 f70aff8c c03aae90 c28330f8 c283310c ffffcf91 000000ac
[ 257.031659] Call Trace:
[ 257.034113] [<c0317c79>] ? e1000_clean+0x5f/0x1f5
[ 257.038909] [<c03aae90>] ? net_rx_action+0x57/0x100
[ 257.043876] [<c0144567>] ? __do_softirq+0x121/0x129
[ 257.048836] [<c0144595>] ? do_softirq+0x26/0x2b
[ 257.053451] [<c01445e7>] ? ksoftirqd+0x4d/0xb7
[ 257.057988] [<c014459a>] ? ksoftirqd+0x0/0xb7
[ 257.062435] [<c014fece>] ? kthread+0x45/0x6b
[ 257.066796] [<c014fe89>] ? kthread+0x0/0x6b
[ 257.071068] [<c011d8cf>] ? kernel_thread_helper+0x7/0x10

Isn't e1000_clean supposed to be really fast? Hanging for many seconds
seems wrong.


... but whatever the bug was, it's fixed in the out-of-tree e1000e
0.5.18.3, which works. Being a daredevil sort and also doing an nfsroot
boot without initramfs I built it statically: this worked fine.

Why is the e1000e in the kernel tree based on such an old driver, anyway
(version 0.3.3.4 according to DRV_VERSION in netdev.c)?

All is not well with the out-of-tree driver, though: 0.5.18.3 doesn't
even build without the patch below, and screams loudly in the log at
startup, e.g.:

[ 93.041327] irq event 57: bogus return value f70b5eb4
[ 93.046871] Pid: 0, comm: swapper Not tainted 2.6.30-rc6-00114-g583172f-dirty #9
[ 93.054952] Call Trace:
[ 93.057649] [<c01662fa>] __report_bad_irq+0x2e/0x6f
[ 93.063098] [<c0166395>] note_interrupt+0x5a/0x149
[ 93.068428] [<c01668ab>] handle_edge_irq+0xdd/0x106
[ 93.073879] [<c011e7ae>] handle_irq+0x1a/0x20
[ 93.078731] [<c011e210>] do_IRQ+0x40/0x83
[ 93.083230] [<c011d4e9>] common_interrupt+0x29/0x30
[ 93.088673] [<c01400d8>] ? copy_process+0xe91/0xea8
[ 93.094125] [<c02b7e12>] ? acpi_idle_enter_c1+0xc8/0xd1
[ 93.099940] [<c02b7ede>] acpi_idle_enter_bm+0xc3/0x296
[ 93.105661] [<c0368dd3>] ? menu_select+0x39/0x9a
[ 93.110816] [<c0368386>] cpuidle_idle_call+0x60/0x92
[ 93.116197] [<c011c192>] cpu_idle+0x44/0x5e
[ 93.120874] [<c05ae8f2>] start_secondary+0x1b6/0x1be

(that's the *last* such message: the first scrolled out of the kernel
log, even with LOG_BUF_SHIFT of 16. Not ideal.)

The message is mystifying, as every single IRQ handler in e1000e
0.5.18.3 returns REQUEST_IRQ or IRQ_NONE, so the message looks spurious
to me. (But then so does the 'incompatible pointer type' compilation
warning kicked up for argument 2 of every call to request_irq() in the
driver, so I'm obviously missing something because I doubt GCC is lying
here. But the prototypes look compatible to me...)


Vile patch to build with 2.6.30rc: obviously not suitable, but what's
mystifying is that the change that added the network namespace parameter
to __dev_get_by_name() is *old*, introduced in
881d966b48b035ab3f3aeaae0f3d3f9b584f45b2 in 2007! How has the e1000e
driver been building since then? Plainly it *has* for other people, but
I don't see how...

(This patch probably would not be necessary if only I could find the
e1000e development tree to match the development kernel, but after much
searching of the mailing list archives via MARC's vile interface I have
found no clue as to where e1000e development actually happens. Some git
tree somewhere, presumably, but the only one I found a reference to was
one of Auke Kok's from 2006, which is gone. I hate out-of-tree drivers
sometimes.)

--- e1000e-0.5.18.3-orig/src/kcompat_ethtool.c 2009-03-05 18:43:14.000000000 +0000
+++ e1000e-0.5.18.3/src//kcompat_ethtool.c 2009-05-20 21:28:02.000000000 +0100
@@ -54,6 +54,7 @@
#include <linux/ethtool.h>
#include <linux/netdevice.h>
#include <asm/uaccess.h>
+#include <net/net_namespace.h>

#include "kcompat.h"

@@ -782,7 +783,7 @@
#define ETHTOOL_OPS_COMPAT
int ethtool_ioctl(struct ifreq *ifr)
{
- struct net_device *dev = __dev_get_by_name(ifr->ifr_name);
+ struct net_device *dev = __dev_get_by_name(&init_net, ifr->ifr_name);
void *useraddr = (void *) ifr->ifr_data;
u32 ethcmd;
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html