Re: kernel exception in Fusion MPT base driver 3.04.06(linux-2.6.24.3)

From: Andrew Morton
Date: Tue Mar 18 2008 - 03:25:48 EST


On Mon, 17 Mar 2008 18:29:28 -0500 "Sabuj Pattanayek" <sabujp@xxxxxxxxx> wrote:

> Hi,

Let's add some cc's.

> I'm running (uname -a):
>
> Linux porpoise 2.6.24.3 #11 Mon Mar 17 17:24:30 CDT 2008 ppc64
> PPC970FX, altivec supported RackMac3,1 GNU/Linux
>
> on an Apple Xserve G5. With the following fibre channel card (lspci
> -vvv, it has two fibre connections):
>
> 0001:06:03.0 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre
> Channel Adapter (rev 81)
> Subsystem: LSI Logic / Symbios Logic Unknown device 10d0
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
> >TAbort- <TAbort- <MAbort- >SERR- <PERR+ INTx-
> Latency: 16 (16000ns min, 2500ns max), Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 53
> Region 0: I/O ports at 0400 [size=256]
> Region 1: Memory at 90030000 (64-bit, non-prefetchable) [size=64K]
> Region 3: Memory at 90020000 (64-bit, non-prefetchable) [size=64K]
> Expansion ROM at 90200000 [disabled] [size=1M]
> Capabilities: [50] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/0 Enable-
> Address: 0000000000000000 Data: 0000
> Capabilities: [68] PCI-X non-bridge device
> Command: DPERE- ERO- RBC=2048 OST=8
> Status: Dev=ff:1f.0 64bit+ 133MHz+ SCD- USC- DC=simple
> DMMRBC=2048 DMOST=8 DMCRS=64 RSCEM- 266MHz- 533MHz-
>
> 0001:06:03.1 Fibre Channel: LSI Logic / Symbios Logic FC929X Fibre
> Channel Adapter (rev 81)
> Subsystem: LSI Logic / Symbios Logic Unknown device 10d0
> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
> >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 16 (16000ns min, 2500ns max), Cache Line Size: 64 bytes
> Interrupt: pin B routed to IRQ 53
> Region 0: I/O ports at <unassigned> [disabled]
> Region 1: Memory at 90010000 (64-bit, non-prefetchable)
> [disabled] [size=64K]
> Region 3: Memory at 90000000 (64-bit, non-prefetchable)
> [disabled] [size=64K]
> Expansion ROM at 90100000 [disabled] [size=1M]
> Capabilities: [50] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/0 Enable-
> Address: 0000000000000000 Data: 0000
> Capabilities: [68] PCI-X non-bridge device
> Command: DPERE- ERO- RBC=2048 OST=8
> Status: Dev=ff:1f.1 64bit+ 133MHz+ SCD- USC- DC=simple
> DMMRBC=2048 DMOST=8 DMCRS=64 RSCEM- 266MHz- 533MHz-
>
> After doing "modprobe mptfc" I get the following kernel exception
> (dmesg output):
>
> Fusion MPT base driver 3.04.06
> Copyright (c) 1999-2007 LSI Corporation
> Fusion MPT FC Host driver 3.04.06
> PCI: Enabling device: (0001:06:03.0), cmd 7
> mptbase: ioc0: Initiating bringup
> ioc0: LSIFC929XL A1: Capabilities={Initiator,Target,LAN}
> scsi4 : ioc0: LSIFC929XL A1, FwRev=01020e00h, Ports=1, MaxQ=1023, IRQ=53
> mptfc: ioc0: FC Link Established, Speed = 2 Gbps
> Oops: Exception in kernel mode, sig: 4 [#1]
> PowerMac
> Modules linked in: mptfc mptscsih mptbase
> NIP: d000000000144984 LR: d00000000014923c CTR: 0000000000000000
> REGS: c0000001387ef8f0 TRAP: 0700 Not tainted (2.6.24.3)
> MSR: 9000000000089032 <EE,ME,IR,DR> CR: 24002028 XER: 00000000
> TASK = c0000001388db040[12431] 'mptfc_wq_4' THREAD: c0000001387ec000
> GPR00: 00000000000000d3 c0000001387efb70 d000000000163b70 c000000138a8af1c
> GPR04: 00000000d3000000 00000000ffffffff 0000000000000000 c000000138a8af00
> GPR08: fffffffffffffffd 00000000000000ff 0000000000000000 00000000ffffffff
> GPR12: d000000000150040 c0000000006f8280 000000000023fb28 c0000000005e25e0
> GPR16: c0000001387efcb0 c0000001380dd758 c0000001387efcc0 c000000138978000
> GPR20: 0000000000000000 c0000001387da000 000000000000067d c0000001380dd454
> GPR24: c0000001387dd3e8 0000000000010000 d000000000159b87 c0000001380dd000
> GPR28: c0000001387efcc0 c0000001387efcd0 d000000000162500 c000000138a8af00
> NIP [d000000000144984] .mpt_add_sge+0x14/0x50 [mptbase]
> LR [d00000000014923c] .mpt_config+0x16c/0x3b0 [mptbase]
> Call Trace:
> [c0000001387efb70] [d000000000149120] .mpt_config+0x50/0x3b0 [mptbase]
> (unreliable)
> [c0000001387efc40] [d00000000015ec8c] .mptfc_rescan_devices+0x17c/0x780 [mptfc]
> [c0000001387efda0] [c0000000000548ac] .run_workqueue+0xec/0x1d0
> [c0000001387efe40] [c0000000000556cc] .worker_thread+0xdc/0x190
> [c0000001387eff00] [c00000000005a2e8] .kthread+0xc8/0xe0
> [c0000001387eff90] [c0000000000217cc] .kernel_thread+0x4c/0x68
> Instruction dump:
> 200000d0 230fff00 210000d0 00010000 02010010 08002500 09200100 03090006
> 230fff00 200000d0 230fff00 210000d0 <00010000> 02010010 08002500 09200100
> ---[ end trace 45fe87d4d747696e ]---

OK, mtp-fusion seems to have gone splat.

> Unable to handle kernel paging request for data at address 0x230fff00210000d0
> Faulting instruction address: 0xc000000000077658
> Oops: Kernel access of bad area, sig: 11 [#2]
> PowerMac
> Modules linked in: mptfc mptscsih mptbase
> NIP: c000000000077658 LR: c000000000077624 CTR: 000000000000000c
> REGS: c00000013a3b7530 TRAP: 0300 Tainted: G D (2.6.24.3)
> MSR: 9000000000009032 <EE,ME,IR,DR> CR: 24002028 XER: 00000000
> DAR: 230fff00210000d0, DSISR: 0000000040000000
> TASK = c00000013a38a810[156] 'pdflush' THREAD: c00000013a3b4000
> GPR00: 0000000000000010 c00000013a3b77b0 c0000000007e8218 000000000000000e
> GPR04: 000000000000000e 00000000000008e8 c00000013897e6b8 0000000000024000
> GPR08: 0000000000000002 0000000000000002 230fff00210000d0 0000000000000003
> GPR12: 0000000000000010 c0000000006f8280 000000000023fb28 c0000000005e25e0
> GPR16: c0000000006c6460 c00000013924fb28 0000000000000000 c00000013a3b7948
> GPR20: c00000000078e2a0 0000000000000000 c0000001381dd440 c00000013924fb28
> GPR24: ffffffffffffffff 000000000000000e 0000000000000000 c00000013a3b7da8
> GPR28: c00000013a3b7940 000000000000000e c000000000765818 c00000013a3b7958
> NIP [c000000000077658] .find_get_pages_tag+0x78/0x100
> LR [c000000000077624] .find_get_pages_tag+0x44/0x100
> Call Trace:
> [c00000013a3b77b0] [c000000000110330]
> .ext3_ordered_writepage+0x180/0x1e0 (unreliable)
> [c00000013a3b7840] [c0000000000825fc] .pagevec_lookup_tag+0x2c/0x50
> [c00000013a3b78d0] [c000000000080150] .write_cache_pages+0x1d0/0x450
> [c00000013a3b7a50] [c0000000000804b4] .do_writepages+0xa4/0xb0
> [c00000013a3b7ad0] [c0000000000cbf60] .__writeback_single_inode+0xb0/0x3c0
> [c00000013a3b7bd0] [c0000000000cc724] .sync_sb_inodes+0x224/0x390
> [c00000013a3b7c90] [c0000000000ccbb4] .writeback_inodes+0xe4/0x110
> [c00000013a3b7d30] [c0000000000810f0] .wb_kupdate+0xf0/0x190
> [c00000013a3b7e30] [c000000000081a28] .pdflush+0x178/0x280
> [c00000013a3b7f00] [c00000000005a2e8] .kthread+0xc8/0xe0
> [c00000013a3b7f90] [c0000000000217cc] .kernel_thread+0x4c/0x68
> Instruction dump:
> 7c7d1b79 41820074 393dffff 3ce00002 39000000 79290020 60e74000 39290001
> 7d2903a6 60000000 79001f24 7d5f002a <e92a0000> 7d293838 7fa93800 41de0068
> ---[ end trace 45fe87d4d747696e ]---

And that's a totally, wildly different part of the kernel. ANd it's a
code-patch which millions of machines run all the time.

It's hard to see how oops #2 could be a consequence of oops #1. Weird.

> Any ideas what's going on here? Attempting to run any further commands
> (e.g. reboot) causes a segfault or hang of the command (e.g. ls). I
> can send the enabled/modularized options of the .config in another
> email if required. Please CC replies to sabujp@xxxxxxxxx since I'm not
> on the list.
>

Did any earlier kernel work OK? If so, which? 2.6.23??

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/