Oops on bootup, 2.6.14-1.1644_FC4smp scsi_mod e1000 root over NFSv3

From: Carl-Johan Kjellander
Date: Thu Dec 15 2005 - 08:46:19 EST


We are getting a repeatable Oops on boot up on an ASUS P5ND2-SLI board.
It only happens intermittently but often enough to be a serious problem.

The machines boot with root over NFS, but there is a SATA drive installed
for some logs. It only happens during boot, never later.

Here is Oops number 1:

*********************

Unable to handle kernel NULL pointer dereferenceACPI: PCI Interrupt Link [APSJ] enabled at IRQ 22
printing eip:
*pde = 00426001
Oops: 0000 [#1]
SMP
Modules linked in: sata_nv libata scsi_mod e1000 nfs lockd nfs_acl sunrpc
CPU: 0
EIP: 0060:[<e09528e6>] Not tainted VLI
EFLAGS: 00010286 (2.6.14-1.1644_FC4smp)
EIP is at scsi_run_queue+0x10/0xaf [scsi_mod]
eax: 00000000 ebx: dd98007c ecx: dffef880 edx: 00000001
esi: ddc99e00 edi: 00000246 ebp: dd9071fc esp: c042af14
ds: 007b es: 007b ss: 0068
Process ksoftirqd/0 (pid: 3, threadinfo=c042a000 task=dfc41ab0)
Stack: dd9071fc dd98007c ddc99e00 00000246 dd9071fc e0952a7d ddc99e00 00000000
00000000 dd98007c e0952e88 00000001 e088260b dcd3a570 c15e3380 dcd3a570
00000000 00000000 00040000 00000024 dd9071fc 00000000 00000000 00000292
Call Trace:
[<e0952a7d>] scsi_end_request+0x83/0xb0 [scsi_mod]
[<e0952e88>] scsi_io_completion+0x29e/0x4d2 [scsi_mod]
[<e088260b>] e1000_clean_rx_irq+0x95/0x4f1 [e1000]
[<e094dcb2>] scsi_finish_command+0x82/0xb5 [scsi_mod]
[<e094db97>] scsi_softirq+0xc0/0x133 [scsi_mod]
[<c02bdf4e>] net_rx_action+0xb7/0x1bb
[<c01258c2>] __do_softirq+0x72/0xdc
[<c0105c43>] do_softirq+0x4b/0x4f
=======================
[<c0125ec2>] ksoftirqd+0x9c/0xe8
[<c0125e26>] ksoftirqd+0x0/0xe8
[<c0133d89>] kthread+0x93/0x97
[<c0133cf6>] kthread+0x0/0x97
[<c0101d5d>] kernel_thread_helper+0x5/0xb
Code: c5 8f df 8b 14 24 8b 42 44 e8 37 b6 9c df 89 44 24 04 89 d8 e8 2e b6 ff ff eb b1 55 57 56 53 83 ec 04 89 04 24 8b 80 10 01 00 00 <8b> 38 80 b8 85 01 00 00 00 0f 88 86 00 00 00 8b 47 44 e8 03 b6
<0>Kernel panic - not syncing: Fatal exception in interrupt
ata3: no device found (phy stat 00000000)
scsi2 : sata_nv
[<c0120358>] panic+0x45/0x1c4
[<c0104caf>] die+0x17b/0x185
[<c031ec40>] do_page_fault+0x0/0x700
[<c031ee49>] do_page_fault+0x209/0x700
[<c031ec40>] do_page_fault+0x0/0x700
[<c010457f>] error_code+0x4f/0x54
[<e09528e6>] scsi_run_queue+0x10/0xaf [scsi_mod]
[<e0952a7d>] scsi_end_request+0x83/0xb0 [scsi_mod]
[<e0952e88>] scsi_io_completion+0x29e/0x4d2 [scsi_mod]
[<e088260b>] e1000_clean_rx_irq+0x95/0x4f1 [e1000]
[<e094dcb2>] scsi_finish_command+0x82/0xb5 [scsi_mod]
[<e094db97>] scsi_softirq+0xc0/0x133 [scsi_mod]
[<c02bdf4e>] net_rx_action+0xb7/0x1bb
[<c01258c2>] __do_softirq+0x72/0xdc
[<c0105c43>] do_softirq+0x4b/0x4f
=======================
[<c0125ec2>] ksoftirqd+0x9c/0xe8
[<c0125e26>] ksoftirqd+0x0/0xe8
[<c0133d89>] kthread+0x93/0x97
[<c0133cf6>] kthread+0x0/0x97
[<c0101d5d>] kernel_thread_helper+0x5/0xb

And here is Oops number 2:

*********************

Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
*pde = 00426001
Oops: 0000 [#1]
SMP
Modules linked in: sata_nv libata scsi_mod e1000 nfs lockd nfs_acl sunrpc
CPU: 0
EIP: 0060:[<e09528e6>] Not tainted VLI
EFLAGS: 00010286 (2.6.14-1.1644_FC4smp)
EIP is at scsi_run_queue+0x10/0xaf [scsi_mod]
eax: 00000000 ebx: dd9acb1c ecx: dffef880 edx: 00000001
esi: dd50fc80 edi: 00000246 ebp: dd49a3b8 esp: c042af14
ds: 007b es: 007b ss: 0068
Process ksoftirqd/0 (pid: 3, threadinfo=c042a000 task=dfc41ab0)
Stack: dd49a3b8 dd9acb1c dd50fc80 00000246 dd49a3b8 e0952a7d dd50fc80 00000000
00000000 dd9acb1c e0952e88 00000001 e088260b c04ac408 ded6b380 c1407fe0
00000000 00000000 00040000 00000024 dd49a3b8 00000000 00000000 00000292
Call Trace:
[<e0952a7d>] scsi_end_request+0x83/0xb0 [scsi_mod]
[<e0952e88>] scsi_io_completion+0x29e/0x4d2 [scsi_mod]
[<e088260b>] e1000_clean_rx_irq+0x95/0x4f1 [e1000]
[<e094dcb2>] scsi_finish_command+0x82/0xb5 [scsi_mod]
[<e094db97>] scsi_softirq+0xc0/0x133 [scsi_mod]
[<c02bdf4e>] net_rx_action+0xb7/0x1bb
[<c01258c2>] __do_softirq+0x72/0xdc
[<c0105c43>] do_softirq+0x4b/0x4f
=======================
[<c0125ec2>] ksoftirqd+0x9c/0xe8
[<c0125e26>] ksoftirqd+0x0/0xe8
[<c0133d89>] kthread+0x93/0x97
[<c0133cf6>] kthread+0x0/0x97
[<c0101d5d>] kernel_thread_helper+0x5/0xb
Code: c5 8f df 8b 14 24 8b 42 44 e8 37 b6 9c df 89 44 24 04 89 d8 e8 2e b6 ff ff eb b1 55 57 56 53 83 ec 04 89 04 24 8b 80 10 01 00 00 <8b> 38 80 b8 85 01 00 00 00 0f 88 86 00 00 00 8b 47 44 e8 03 b6
<0>Kernel panic - not syncing: Fatal exception in interrupt
[<c0120358>] panic+0x45/0x1c4
[<c0104caf>] die+0x17b/0x185
[<c031ec40>] do_page_fault+0x0/0x700
[<c031ee49>] do_page_fault+0x209/0x700
[<c031ec40>] do_page_fault+0x0/0x700
[<c010457f>] error_code+0x4f/0x54
[<e09528e6>] scsi_run_queue+0x10/0xaf [scsi_mod]
[<e0952a7d>] scsi_end_request+0x83/0xb0 [scsi_mod]
[<e0952e88>] scsi_io_completion+0x29e/0x4d2 [scsi_mod]
[<e088260b>] e1000_clean_rx_irq+0x95/0x4f1 [e1000]
[<e094dcb2>] scsi_finish_command+0x82/0xb5 [scsi_mod]
[<e094db97>] scsi_softirq+0xc0/0x133 [scsi_mod]
[<c02bdf4e>] net_rx_action+0xb7/0x1bb
[<c01258c2>] __do_softirq+0x72/0xdc
[<c0105c43>] do_softirq+0x4b/0x4f
=======================
[<c0125ec2>] ksoftirqd+0x9c/0xe8
[<c0125e26>] ksoftirqd+0x0/0xe8
[<c0133d89>] kthread+0x93/0x97
[<c0133cf6>] kthread+0x0/0x97
[<c0101d5d>] kernel_thread_helper+0x5/0xb


Both Oopses are at the same spot with scsi_mod and e1000 being the culprits.

Hardware:
ASUS P5ND2-SLI
P4 630+
Double Intel e1000 on motherboard
SATA Disk (only for some logs, not booted from)
XFX NVidia GeForce 6800XT (driver not loaded so not tainted)

Kernel:
2.6.14-1.1644_FC4smp standard Fedora Core 4 kernel with a custom initrd
to be able to boot off network.

It might be connected to when sata_nv looking for drives, but I'm not
sure.

Has anybody seen this before? Is there a patch? Workaround?

Please CC me as I'm not on the list.

/Carl-Johan Kjellander
--
begin 644 carljohan_at_kjellander_dot_com.gif
Y1TE&.#=A(0`F`(```````/___RP`````(0`F```"@XR/!\N<#U.;+MI`<[U(>\!UGQ9BGT%>'D2I
Y*=NX,2@OUF2&<827ILW;^822C>\7!!Z1,!K'B5(6H<SH-"E*TJ3%*/>QI6:7"A>Y?):D2^*U@NCV
R<MOQ=]V(B6>LZYD-_T1U<@3W]A4(^$-W4]A#V")W6#.R"$;IR'@).46BN7$9>5D``#L`

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature