Re: nforce chipset + dualcore x86-64: Oops, NMI, Null pointer deref,etc

From: Robert Hancock
Date: Sat Dec 02 2006 - 12:48:03 EST


Bart Trojanowski wrote:
Hi,

to see my history please see this thread...

http://lkml.org/lkml/2006/11/29/156

It might be sata related, but I cannot tell for sure.

In summary, I have an Opteron 170 in a Shuttle SN25P (nforce4 chipset).
I've tested the ram overnight and swapped out every component in the
system except for the HDDs. I see these problems only with the
dual-core, and even on an older Asus nforce4 based motherboard.

I've had a hell of a time getting 2.6.18 to stay up longer then a day.
2.6.19 is better but I am still seeing strangeness. This morning I
noticed the following events (the complete dmesg is included below).

[36514.296462] Unable to handle kernel paging request at 00000000ff757c4f RIP: ..
[36514.515894] NMI Watchdog detected LOCKUP on CPU 0
..
[36519.958059] Unable to handle kernel NULL pointer dereference at 0000000000000038 RIP:

Now the system is quite non-responsive. I start programs, but they
don't run for a few minutes. Existing programs seem to work ok.

Any suggestions would be appreciated... I am going to boot with noapic
for now. From what I can tell things work better with APIC disabled.

...

[ 27.337641] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 27.344035] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 27.352191] Probing IDE interface ide0...
[ 28.145997] hda: PLEXTOR DVDR PX-716AL, ATAPI CD/DVD-ROM drive
[ 28.457643] Probing IDE interface ide1...
[ 28.970267] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

Hmm. Something seems missing here. Chipset-specific driver not enabled? I don't see any references to DMA mode or the controller type. It seems like in this case something in drivers/ide is blowing up later on (which shouldn't happen in any case but that code is not the greatest).

Myself, I would try disabling CONFIG_IDE entirely and enabling the corresponding new libata PATA driver for this chipset's PATA ports (for nForce4 it will be pata_amd). Even if it still doesn't work it may allow the real problem to be diagnosed more easily.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@xxxxxxxxxxxxx
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/