[BUG] OOPS 2.6.24.2 raid5 write with ioatdma

From: Laurent CORBES
Date: Fri Feb 15 2008 - 11:45:26 EST


Hi all,

I got a raid5 oops when trying to write on a raid 5 array, with ioatdma loaded
and without DCA activated in bios:

------------[ cut here ]------------
kernel BUG at crypto/async_tx/async_xor.c:185!
invalid opcode: 0000 [#2] SMP
Modules linked in: dm_snapshot dm_mirror dm_mod thermal parport_pc parport button processor

Pid: 1135, comm: md11_raid5 Tainted: G D (2.6.24.2-sj-std-p4-smp #2)
EIP: 0060:[<c020713b>] EFLAGS: 00010202 CPU: 2
EIP is at async_xor+0x31b/0x320
EAX: f7556f5c EBX: f7556f5c ECX: f77433f8 EDX: c039d410
ESI: 00000001 EDI: 00000001 EBP: 00000000 ESP: f756dcfc
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process md11_raid5 (pid: 1135, ti=f756c000 task=f7032db0 task.ti=f756c000)
Stack: 00000001 00000000 c02070fc 00000400 00000000 f756dd70 c16eda20 00000000
00000000 3770d000 00000001 0000001a 00000000 00000001 f7556f5c f77433f8
c039d410 00000002 f61fa940 f756dd70 f756ddc0 c039d841 00000001 00001000
Call Trace:
[<c02070fc>] async_xor+0x2dc/0x320
[<c039d410>] ops_complete_write+0x0/0x60
[<c039d841>] ops_run_postxor+0xd1/0x160
[<c039d410>] ops_complete_write+0x0/0x60
[<c039c569>] async_copy_data+0x79/0x140
[<c039e9d1>] handle_stripe5+0x1021/0x1570
[<c02e882d>] scsi_alloc_sgtable+0x7d/0x1d0
[<c02e89d2>] scsi_init_io+0x52/0xd0
[<c02e8497>] scsi_get_cmd_from_req+0x27/0x40
[<c03adb20>] md_thread+0x0/0xe0
[<c039ff08>] handle_stripe+0x28/0xef0
[<c03adb20>] md_thread+0x0/0xe0
[<c03ac154>] md_check_recovery+0x24/0x4e0
[<c043009b>] schedule+0x1fb/0x7c0
[<c03adb20>] md_thread+0x0/0xe0
[<c03a114d>] raid5d+0x37d/0x400
[<c0127837>] lock_timer_base+0x27/0x60
[<c01278ce>] del_timer_sync+0xe/0x20
[<c04308a1>] schedule_timeout+0x51/0xc0
[<c03adb20>] md_thread+0x0/0xe0
[<c03adb20>] md_thread+0x0/0xe0
[<c03adb43>] md_thread+0x23/0xe0
[<c0131900>] autoremove_wake_function+0x0/0x40
[<c03adb20>] md_thread+0x0/0xe0
[<c0131652>] kthread+0x42/0x70
[<c0131610>] kthread+0x0/0x70
[<c01037b7>] kernel_thread_helper+0x7/0x10
=======================
Code: fe ff ff 8b 5c 24 64 c7 43 04 01 00 00 00 e9 63 fe ff ff 0f 0b eb fe c7 44 24 04 a9 41 44 c0 c7 04 24 9c e7 4b c0 e8 15 7d f1 ff <0f> 0b eb fe 90 55 57 89 cf 56 53 89 d3 83 ec 20 ba 05 00 00 00
EIP: [<c020713b>] async_xor+0x31b/0x320 SS:ESP 0068:f756dcfc
---[ end trace 091e56cc9ca29fd6 ]---

It seems like aync_tx cannot process the xor (trying to access ioatdma but
failed ?).

When I enable DCA in system bios I cannot boot, the ioatdma subsystem failed to
initialize, it stalled at:
ioatdma: ioat_dma_test_callback(00008086)

Here is the mdstat:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdd2[3] sdc2[2] sdb2[1] sda2[0]
979840 blocks [4/4] [UUUU]

md2 : active raid1 sdh2[3] sdg2[2] sdf2[1] sde2[0]
979840 blocks [4/4] [UUUU]

md10 : active raid5 sdh3[7] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] sda3[0]
6830121984 blocks level 5, 256k chunk, algorithm 2 [8/8] [UUUUUUUU]
bitmap: 0/233 pages [0KB], 2048KB chunk

md3 : active raid1 sdl2[3] sdk2[2] sdj2[1] sdi2[0]
979840 blocks [4/4] [UUUU]

md4 : active raid1 sdp2[3] sdo2[2] sdn2[1] sdm2[0]
979840 blocks [4/4] [UUUU]

md11 : active raid5 sdp3[7] sdo3[6] sdn3[5] sdm3[4] sdl3[3] sdk3[2] sdj3[1] sdi3[0]
6830121984 blocks level 5, 256k chunk, algorithm 2 [8/8] [UUUUUUUU]
bitmap: 0/233 pages [0KB], 2048KB chunk

md0 : active raid1 sdb1[1] sda1[0]
48064 blocks [2/2] [UU]

unused devices: <none>

Here is the lspci:
00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub (rev b1)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 2-3 (rev b1)
00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev b1)
00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev b1)
00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev b1)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset Error Reporting Registers (rev b1)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset Error Reporting Registers (rev b1)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset Error Reporting Registers (rev b1)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09)
01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01)
03:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
03:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
06:00.0 Ethernet controller: Intel Corporation 631xESB/632xESB DPT LAN Controller Copper (rev 01)
06:00.1 Ethernet controller: Intel Corporation 631xESB/632xESB DPT LAN Controller Copper (rev 01)
09:00.0 PCI bridge: Intel Corporation 80333 Segment-A PCI Express-to-PCI Express Bridge
09:00.2 PCI bridge: Intel Corporation 80333 Segment-B PCI Express-to-PCI Express Bridge
0a:0e.0 RAID bus controller: Areca Technology Corp. ARC-1260 16-Port PCI-Express to SATA RAID Controller
0d:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)

full dmesg is in attachement.

Thanks.
--
Laurent Corbes - laurent.corbes@xxxxxxxxxxxx
+33 (0)1 4996 6325
Smartjog SA - http://www.smartjog.com/

Attachment: dmesg
Description: Binary data