Panic at _blk_run_queue on 2.6.32

From: Rich, Jason
Date: Tue Jul 09 2013 - 13:43:44 EST


Greetings,
I've recently encountered an issue where multiple hosts are failing to boot up about 1/5 of the time. So far I have confirmed this
issue on three seperate host machines. The issue presents itself after updating 2.6.32.39 to patch 50 and patch 61.
Both patch levels result in the failure described below. Since this occurs on multiple hosts, I feel I can safely rule out hardware.

It is also of note that I have not seen this behavior on the 3.4.26 kernel, or on any of my 32bit hosts.
That said, I have to support this software release (which runs on the 2.6 kernel) for at least another two years.
I've looked through the list of open and closed issues on bugzilla and see nothing similar.
The console log of the crash is below, as well as the output of the crash dump (using crash tool).
Lsmod, lspci & kernel config attached.
I'm at a loss and consider myself a novice at debugging kernel issues. Any help is greatly appreciated.

Some details about the host:
1x Intel Xeon L5518 (quad core + HT)
32G DDR3
on board eUSB
This is an ATCA blade (irrelavent to the issue no doubt)
Lsmod, lspci & kernel config attached.


>From the console:
initramfs bootup: 2.6.32.61.TEK.V7.12.1.5024.p61 x86_64
<initramfs bootup...truncated as irrelevant to the issue at hand>

BOOT_IMAGE=/boot/bzImage-2.6.32.61.TEK.V7.12.1.5024.p61 -> /boot/bzImage-2.6.32.61.TEK.V7.12.1.5024.p61
<initramfs bootup...truncated as irrelevant to the issue at hand>

Setting kernel variables ... /etc/sysctl.conf...done.
Setting up X server socket directory /tmp/.X11-unix....
Setting up ICE socket directory /tmp/.ICE-unix....
Starting portmap daemon....
[ 30.757040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[ 30.765242] IP: [<ffffffff811c1eeb>] elv_queue_empty+0x12/0x24
[ 30.771296] PGD 0
[ 30.773408] Oops: 0000 [#1] PREEMPT SMP
[ 30.777525] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb1/1-3/1-3:1.0/host9/target9:0:0/9:0:0:0/scsi_device/9:0:0:0/uevent
[ 30.790203] CPU 0
[ 30.792253] Modules linked in: mptctl ipmi_poweroff igb ixgbe usb_storage ahci mptsas mptscsih mptbase scsi_transport_sas edd i2c_dev ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler [last unloaded: usb_storage]
[ 30.812346] Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.32.61.TEK.V7.12.1.5024.p61 #1 ATCA-4500
[ 30.821280] RIP: 0010:[<ffffffff811c1eeb>] [<ffffffff811c1eeb>] elv_queue_empty+0x12/0x24
[ 30.829911] RSP: 0018:ffff880028203d28 EFLAGS: 00010046
[ 30.835445] RAX: 0000000000000000 RBX: ffff88083cf7ec98 RCX: ffff88083cf7ec98
[ 30.842885] RDX: ffff88083d3a88c0 RSI: 0000000000000292 RDI: ffff88083cf7ec98
[ 30.850226] RBP: ffff880028203d28 R08: ffff880028203d68 R09: 0000000000ade46b
[ 30.857708] R10: ffff88083ced5050 R11: ffff880028203d68 R12: 0000000000000292
[ 30.865182] R13: ffff880028203d98 R14: ffff88083ced5050 R15: 0000000000000000
[ 30.872594] FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
[ 30.881111] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 30.887087] CR2: 0000000000000040 CR3: 000000083db0a000 CR4: 00000000000006f0
[ 30.894484] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 30.901905] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 30.909376] Process ksoftirqd/0 (pid: 4, threadinfo ffff88083f8a6000, task ffff88083f884080)
[ 30.918231] Stack:
[ 30.920368] ffff880028203d48 ffffffff811c434b ffff88083cf7ec98 ffff88083cf7ec98
[ 30.927955] <0> ffff880028203d68 ffffffff811c43be ffff88083ced5000 ffff88083cf7ec98
[ 30.936074] <0> ffff880028203dd8 ffffffff812a9040 ffff880028203d88 ffff880028203d98
[ 30.944283] Call Trace:
[ 30.946853] <IRQ>
[ 30.949053] [<ffffffff811c434b>] __blk_run_queue+0x22/0x74
[ 30.954855] [<ffffffff811c43be>] blk_run_queue+0x21/0x35
[ 30.960479] [<ffffffff812a9040>] scsi_run_queue+0x20a/0x2a8
[ 30.966331] [<ffffffff812a9be7>] scsi_next_command+0x36/0x46
[ 30.972265] [<ffffffff812aa193>] scsi_end_request+0x7e/0x8f
[ 30.978145] [<ffffffff812aa4aa>] scsi_io_completion+0x16b/0x396
[ 30.984382] [<ffffffff812a46fc>] scsi_finish_command+0xb0/0xb9
[ 30.990575] [<ffffffff812aa7d8>] scsi_softirq_done+0xf3/0xfc
[ 30.996578] [<ffffffff811c8f27>] blk_done_softirq+0x67/0x77
[ 31.002493] [<ffffffff81042fe7>] __do_softirq+0xaa/0x147
[ 31.008176] [<ffffffff8100cc0c>] call_softirq+0x1c/0x28
[ 31.013640] <EOI>
[ 31.015819] [<ffffffff8100e023>] do_softirq+0x33/0x6b
[ 31.021149] [<ffffffff810431e5>] ksoftirqd+0x82/0x149
[ 31.026484] [<ffffffff81043163>] ? ksoftirqd+0x0/0x149
[ 31.031883] [<ffffffff81050ebd>] kthread+0x7a/0x82
[ 31.036948] [<ffffffff8100cb0a>] child_rip+0xa/0x20
[ 31.042174] [<ffffffff81050e43>] ? kthread+0x0/0x82
[ 31.047294] [<ffffffff8100cb00>] ? child_rip+0x0/0x20
[ 31.052655] Code: 87 e0 00 00 00 48 8b 47 08 48 89 77 08 48 89 3e 48 89 46 08 48 89 30 c9 c3 31 c0 48 39 3f 55 48 8b 57 18 48 89 e5 75 13 48 8b 02 <48> 8b 50 40 b8 01 00 00 00 48 85 d2 74 02 ff d2 c9 c3 48 8b 47
[ 31.072919] RIP [<ffffffff811c1eeb>] elv_queue_empty+0x12/0x24
[ 31.079094] RSP <ffff880028203d28>
[ 31.082736] CR2: 0000000000000040
[ 31.086171] ---[ end trace d6541ba31725c49a ]---
[ 31.090995] Kernel panic - not syncing: Fatal exception in interrupt
[ 31.097613] Pid: 4, comm: ksoftirqd/0 Tainted: G D 2.6.32.61.TEK.V7.12.1.5024.p61 #1
[ 31.106415] Call Trace:
[ 31.108992] <IRQ> [<ffffffff81424681>] panic+0x84/0x139
[ 31.114663] [<ffffffff814273bd>] oops_end+0xa9/0xb9
[ 31.119842] [<ffffffff810280ac>] no_context+0x136/0x142
[ 31.125361] [<ffffffff8105ccb1>] ? tick_program_event+0x25/0x27
[ 31.131604] [<ffffffff8102822a>] __bad_area_nosemaphore+0x172/0x195
[ 31.138161] [<ffffffff8103671c>] ? try_to_wake_up+0x294/0x2af
[ 31.144223] [<ffffffff8102825b>] bad_area_nosemaphore+0xe/0x10
[ 31.150441] [<ffffffff81428919>] do_page_fault+0x14a/0x281
[ 31.156245] [<ffffffff814268ff>] page_fault+0x1f/0x30
[ 31.161633] [<ffffffff811c1eeb>] ? elv_queue_empty+0x12/0x24
[ 31.167644] [<ffffffff811c434b>] __blk_run_queue+0x22/0x74
[ 31.173421] [<ffffffff811c43be>] blk_run_queue+0x21/0x35
[ 31.179068] [<ffffffff812a9040>] scsi_run_queue+0x20a/0x2a8
[ 31.184994] [<ffffffff812a9be7>] scsi_next_command+0x36/0x46
[ 31.190927] [<ffffffff812aa193>] scsi_end_request+0x7e/0x8f
[ 31.196808] [<ffffffff812aa4aa>] scsi_io_completion+0x16b/0x396
[ 31.203044] [<ffffffff812a46fc>] scsi_finish_command+0xb0/0xb9
[ 31.209261] [<ffffffff812aa7d8>] scsi_softirq_done+0xf3/0xfc
[ 31.215203] [<ffffffff811c8f27>] blk_done_softirq+0x67/0x77
[ 31.221067] [<ffffffff81042fe7>] __do_softirq+0xaa/0x147
[ 31.226619] [<ffffffff8100cc0c>] call_softirq+0x1c/0x28
[ 31.232170] <EOI> [<ffffffff8100e023>] do_softirq+0x33/0x6b
[ 31.238174] [<ffffffff810431e5>] ksoftirqd+0x82/0x149
[ 31.243511] [<ffffffff81043163>] ? ksoftirqd+0x0/0x149
[ 31.248935] [<ffffffff81050ebd>] kthread+0x7a/0x82
[ 31.253986] [<ffffffff8100cb0a>] child_rip+0xa/0x20
[ 31.259156] [<ffffffff81050e43>] ? kthread+0x0/0x82
[ 31.264311] [<ffffffff8100cb00>] ? child_rip+0x0/0x20



KERNEL CRASH DUMP:
crash 5.0.6
Copyright (C) 2002-2010 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

SYSTEM MAP: System.map-2.6.32.61.TEK.V7.12.1.5024.p61
DEBUG KERNEL: vmlinux-2.6.32.61.TEK.V2013.07.08.12.26.13.jrich (2.6.32.61.TEK.V2013.07.08.12.26.13.jrich)
DUMPFILE: DUMP [PARTIAL DUMP]
CPUS: 8
DATE: Fri Jul 5 17:33:19 2013
UPTIME: 00:00:30
LOAD AVERAGE: 1.20, 0.27, 0.09
TASKS: 186
NODENAME: (none)
RELEASE: 2.6.32.61.TEK.V7.12.1.5024.p61
VERSION: #1 SMP PREEMPT Fri Jul 5 12:58:36 CDT 2013
MACHINE: x86_64 (2133 Mhz)
MEMORY: 32 GB
PANIC: "[ 30.788431] Oops: 0000 [#1] PREEMPT SMP " (check log for details)
PID: 25
COMMAND: "ksoftirqd/7"
TASK: ffff88083f94a820 [THREAD_INFO: ffff88083f966000]
CPU: 7
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 25 TASK: ffff88083f94a820 CPU: 7 COMMAND: "ksoftirqd/7"
#0 [ffff8800282e3a20] machine_kexec at ffffffff8102274d
#1 [ffff8800282e3a80] crash_kexec at ffffffff81067e82
#2 [ffff8800282e3b50] oops_end at ffffffff81427349
#3 [ffff8800282e3b80] no_context at ffffffff810280ac
#4 [ffff8800282e3bc0] __bad_area_nosemaphore at ffffffff8102822a
#5 [ffff8800282e3c10] bad_area_nosemaphore at ffffffff8102825b
#6 [ffff8800282e3c20] do_page_fault at ffffffff81428919
#7 [ffff8800282e3c70] page_fault at ffffffff814268ff
[exception RIP: elv_queue_empty+18]
RIP: ffffffff811c1eeb RSP: ffff8800282e3d28 RFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88083d7dec98 RCX: ffff88083d7dec98
RDX: ffff88083c53d440 RSI: 0000000000000292 RDI: ffff88083d7dec98
RBP: ffff8800282e3d28 R8: ffff8800282e3d68 R9: ffffffff81436400
R10: ffff88083c574850 R11: ffff8800282e3d68 R12: 0000000000000292
R13: ffff8800282e3d98 R14: ffff88083c574850 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff8800282e3d30] __blk_run_queue at ffffffff811c434b
#9 [ffff8800282e3d50] blk_run_queue at ffffffff811c43be
#10 [ffff8800282e3d70] scsi_run_queue at ffffffff812a9040
#11 [ffff8800282e3de0] scsi_next_command at ffffffff812a9be7
#12 [ffff8800282e3e10] scsi_end_request at ffffffff812aa193
#13 [ffff8800282e3e50] scsi_io_completion at ffffffff812aa4aa
#14 [ffff8800282e3ec0] scsi_finish_command at ffffffff812a46fc
#15 [ffff8800282e3ef0] scsi_softirq_done at ffffffff812aa7d8
#16 [ffff8800282e3f20] blk_done_softirq at ffffffff811c8f27
#17 [ffff8800282e3f50] __do_softirq at ffffffff81042fe7
#18 [ffff8800282e3fb0] call_softirq at ffffffff8100cc0c
--- <IRQ stack> ---
#19 [ffff88083f967e68] do_softirq at ffffffff8100e023
#20 [ffff88083f967e88] ksoftirqd at ffffffff810431e5
#21 [ffff88083f967ed8] kthread at ffffffff81050ebd
#22 [ffff88083f967f48] kernel_thread at ffffffff8100cb0a

Attachment: kernelConfig
Description: kernelConfig

root@host-barwick-1-1: ~ # lspci
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)
00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 2 (rev 22)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)
00:04.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 4 (rev 22)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22)
00:06.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 6 (rev 22)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)
00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 22)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22)
00:0a.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 10 (rev 22)
00:13.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 22)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 22)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 22)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
05:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02)
0e:00.0 Ethernet controller: Intel Corporation 82598 10GbE PCI-Express Ethernet Controller (rev 01)
0e:00.1 Ethernet controller: Intel Corporation 82598 10GbE PCI-Express Ethernet Controller (rev 01)
23:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
23:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
fe:00.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath Architecture Generic Non-Core Registers (rev 05)
fe:00.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath Architecture System Address Decoder (rev 05)
fe:02.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Link 0 (rev 05)
fe:02.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Physical 0 (rev 05)
fe:02.4 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Link 1 (rev 05)
fe:02.5 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Physical 1 (rev 05)
fe:03.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller (rev 05)
fe:03.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Target Address Decoder (rev 05)
fe:03.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller RAS Registers (rev 05)
fe:03.4 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Test Registers (rev 05)
fe:04.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Control Registers (rev 05)
fe:04.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Address Registers (rev 05)
fe:04.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Rank Registers (rev 05)
fe:04.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Thermal Control Registers (rev 05)
fe:05.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Control Registers (rev 05)
fe:05.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Address Registers (rev 05)
fe:05.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Rank Registers (rev 05)
fe:05.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Thermal Control Registers (rev 05)
fe:06.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Control Registers (rev 05)
fe:06.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Address Registers (rev 05)
fe:06.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Rank Registers (rev 05)
fe:06.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Thermal Control Registers (rev 05)
root@host-barwick-1-1: ~ # lsmod
Module Size Used by
nfsd 247665 13
exportfs 3018 1 nfsd
bond0 78089 0
mptctl 25329 0
ipmi_poweroff 6649 0
ixgbe 171781 0
igb 111723 0
usb_storage 32904 0
ahci 33051 0
mptsas 39520 4
mptscsih 25028 1 mptsas
mptbase 68793 3 mptctl,mptsas,mptscsih
scsi_transport_sas 19871 1 mptsas
edd 7005 0
i2c_dev 4336 0
ipmi_watchdog 12905 1
ipmi_devintf 6259 0
ipmi_si 31933 2
ipmi_msghandler 28814 4 ipmi_poweroff,ipmi_watchdog,ipmi_devintf,ipmi_si