Re: 2.6.31-rc8 + patch-2.6.31-rc8-rt9 = oops in mptsas

From: Glenn Elliott
Date: Wed Sep 09 2009 - 10:54:51 EST


Desai, Kashyap wrote:
Glenn,

After applying patch http://marc.info/?l=linux-scsi&m=125187353611068&w=2

my understanding is Opps will not be same. Is it correct?

I have taken some imp snaps from you Opps message as below.

.flush_workqueue+0x68/0xb8
[c0000007fde23500] [c00000000030c320] .mptsas_cleanup_fw_event_q+0x128/0x154
[c0000007fde235b0] [c00000000030c650] .mptsas_ioc_reset+0x98/0xe0
[c0000007fde23640] [c0000000002f9610] .mpt_signal_reset+0x94/0xb4
[c0000007fde236c0] [c0000000003018e4] .mpt_do_ioc_recovery+0x15ec/0x16e8
[c0000007fde23890] [c000000000301ad8] .mpt_HardResetHandler+0xf8/0x19c



flush_workqueue() will not be called from mptsas_ioc_reset as it was happening without the patch.

Please add more details if I am guessing wrong.

Thanks,
Kashyap


-----Original Message-----
From: Glenn Elliott [mailto:arakageeta.lkml@xxxxxxxxx] Sent: Wednesday, September 09, 2009 2:26 AM
To: Desai, Kashyap
Cc: linux-kernel@xxxxxxxxxxxxxxx; tglx@xxxxxxxxxxxxx; DL-MPT Fusion Linux; Bjoern Brandenburg
Subject: Re: 2.6.31-rc8 + patch-2.6.31-rc8-rt9 = oops in mptsas

Desai, Kashyap wrote:
Glenn,

There is one fix in same area recently posted to upstream.
Can you try applying this patch?

http://marc.info/?l=linux-scsi&m=125187353611068&w=2

Thanks,
Kashyap

-----Original Message-----
From: Glenn Elliott [mailto:arakageeta.lkml@xxxxxxxxx] Sent: Friday, September 04, 2009 10:20 PM
To: linux-kernel@xxxxxxxxxxxxxxx
Cc: tglx@xxxxxxxxxxxxx; DL-MPT Fusion Linux; Bjoern Brandenburg
Subject: 2.6.31-rc8 + patch-2.6.31-rc8-rt9 = oops in mptsas

Hello,

I get an oops when I boot 2.6.31-rc8 with the Realtime Preempt patch, patch-2.6.31-rc8-rt9, on my IBM QS22 (Cell Blade-- PPC-based). It appears to be happening somewhere in the SAS disk related driver, mptsas.

The unpatched 2.6.31-rc8 boots without issue. I am using the cell_defconfig configuration with the same minor additions (IPv6, auditing, etc.) for both patched and unpatched kernels. The RT-patched configuration also includes the necessary RT-related settings.

Below is the captured oops, with a little extra logging, from the serial console (it didn't make it to /var/log/messages). I would be happy to provide any additional information.

Thank you,
Glenn Elliott

mptscsih: ioc0: attempting task abort! (sc=c0000007fdd02080)
sd 0:0:0:0: CDB: cdb[0]=0x1a: 1a 00 08 00 04 00
mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!!
mptbase: ioc0: Initiating recovery
mptscsih: ioc0: task abort: SUCCESS (sc=c0000007fdd02080)
mptscsih: ioc0: attempting task abort! (sc=c0000007fdd02080)
sd 0:0:0:0: CDB: cdb[0]=0x0: 00 00 00 00 00 00
mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!
mptbase: ioc0: Initiating recovery
mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!!
mptscsih: ioc0: task abort: SUCCESS (sc=c0000007fdd02080)
mptscsih: ioc0: attempting target reset! (sc=c0000007fdd02080)
sd 0:0:0:0: CDB: cdb[0]=0x1a: 1a 00 08 00 04 00
mptscsih: ioc0: WARNING - TaskMgmt type=3: ioc_state: DOORBELL_ACTIVE (0x2c000000)!
mptscsih: ioc0: target reset: FAILED (sc=c0000007fdd02080)
mptscsih: ioc0: attempting bus reset! (sc=c0000007fdd02080)
sd 0:0:0:0: CDB: cdb[0]=0x1a: 1a 00 08 00 04 00
mptscsih: ioc0: WARNING - TaskMgmt type=4: ioc_state: DOORBELL_ACTIVE (0x2c000000)!
mptscsih: ioc0: bus reset: FAILED (sc=c0000007fdd02080)
mptscsih: ioc0: attempting host reset! (sc=c0000007fdd02080)
mptscsih: ioc0: host reset: SUCCESS (sc=c0000007fdd02080)
------------[ cut here ]------------
Badness at kernel/workqueue.c:372
NIP: c000000000086a04 LR: c000000000087cac CTR: c00000000030c5b8
REGS: c0000007fde230f0 TRAP: 0700 Not tainted (2.6.31-rc8-rt9)
MSR: 9000000000029032 <EE,ME,CE,IR,DR> CR: 44022024 XER: 20000000
TASK = c0000007fa3e5c50[2606] 'mpt/0' THREAD: c0000007fde20000 CPU: 2
GPR00: 0000000000000001 c0000007fde23370 c0000000006992b0 c0000003fdde0c80
GPR04: 0000000000000000 0000000000000000 000000000000000a c0000003fe0ce114
GPR08: 0000000000000000 c0000007fa3e5c50 c00000000044ebb0 0000000000000000
GPR12: 0000000000000000 c000000000722a00 0000000000000000 0000000000000004
GPR16: c0000003fe0ce998 c0000003fe0ce968 0000000000000000 0000000000000000
GPR20: 0000000000000001 0000000000000000 c0000003fe0ce108 0000000000000001
GPR24: 0000000000000000 0000000000000001 c0000003fe0ce100 c0000003fe0ce720
GPR28: c0000003fddf4000 c0000003fdde0c80 c000000000640080 0000000000000000
NIP [c000000000086a04] .flush_cpu_workqueue+0x2c/0xa4
LR [c000000000087cac] .flush_workqueue+0x68/0xb8
Call Trace:
[c0000007fde23370] [0000000000200200] 0x200200 (unreliable)
[c0000007fde23470] [c000000000087cac] .flush_workqueue+0x68/0xb8
[c0000007fde23500] [c00000000030c320] .mptsas_cleanup_fw_event_q+0x128/0x154
[c0000007fde235b0] [c00000000030c650] .mptsas_ioc_reset+0x98/0xe0
[c0000007fde23640] [c0000000002f9610] .mpt_signal_reset+0x94/0xb4
[c0000007fde236c0] [c0000000003018e4] .mpt_do_ioc_recovery+0x15ec/0x16e8
[c0000007fde23890] [c000000000301ad8] .mpt_HardResetHandler+0xf8/0x19c
[c0000007fde23930] [c00000000030215c] .mpt_config+0x3d4/0x470
[c0000007fde23a30] [c0000000002ffd28] .mpt_findImVolumes+0xd0/0x6a0
[c0000007fde23c00] [c00000000030dacc] .mptsas_firmware_event_work+0x74/0x109c
[c0000007fde23d90] [c0000000000876e8] .worker_thread+0x20c/0x2e0
[c0000007fde23ea0] [c00000000008cb88] .kthread+0xa8/0xb4
[c0000007fde23f90] [c000000000025b68] .kernel_thread+0x54/0x70
Instruction dump:
4bfffe34 fba1ffe8 7c0802a6 f8010010 7c7d1b78 fbe1fff8 f821ff01 e80d01b0
e92300a0 7c004a78 7c000074 7800d182 <0b000000> 48395349 60000000 38bd0038
Thank you for your suggestion, Kashyap, but it does not appear to help. The system still hangs on boot. Is there any other information I can gather that may be helpful?

-Glenn
I will try to get more information. The system is touchy-- I rarely get an
oops message. In fact, I've posted the only one that I've received. I have
booted my system many times and, so far, it simply hangs.

Thank you,
Glenn

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/