kernel 2.2.15 - Uhhuh NMI Received

From: Sammy Lau (sammy@outblaze.com)
Date: Tue May 23 2000 - 01:12:31 EST


Hi all,

        =====
        Uhhuh. NMI received. Dazed and confused, but trying to continue
        You probably have a hardware problem with your RAM chips
        =====

        Recently, I've upgraded the kernel from 2.2.14 to 2.2.15 (stable). And
the kernel starts showing me some error message (not right after the
system bootup but I saw the error message from dmesg) which I think it
is causing file system corruption. I've search thru some mailing list
archives and most of them said it is very likely caused by faulty RAM
chip.

        However, when I tried to trace thru the kernel source I've found that
the message I've got is from a function called mem_parity_error
(arch/i386/kernel/trap.c). As the RAM chip I'm using doesn't have parity
check, I assume mem_parity_error should not be called in any
circumstances. Besides, I've run a memory subsytem testing utilities by
Charles Cazabon, Simon Kirby. And the result shows me no error. I've
also check the bios to see whether there is

        Is there any way for me to gather more information to make sure it is a
faulty memory problem or it is already a known bug? I'm trying to
provide more info in this mail. Sorry for this bulky e-mail and please
excuse me for any bogus points I've given above.

        I've not yet subscribe this mailing list. Please cc a copy to my
personal e-mail address. Thanks a lot. Thanks.

-- 
Cheers,
Sammy Lau
Outblaze Co. Ltd.
Mail: sammy@outblaze.com

Hardware Info: === Distribution: RedHat 6.2 Kernel: 2.2.15 (stable) Network Interface: eepro100 CPU: PIII 550 (Dual) Mother Board: Intel 440GX Bios: PhoenixBIOS Raid Controller: Mylex Acceleraid 250 (DAC960) Memory: 2G (no parity check)

Error Log (Extract): === May 15 04:10:19 ws1 kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue May 15 04:10:19 ws1 kernel: EXT2-fs error (device rd(48,1)): ext2_readdir: bad entry in directory #6163875: inode out of bounds - offset=36, inode=3681389987, rec_len=12, name_len=3 May 15 04:10:19 ws1 kernel: You probably have a hardware problem with your RAM chips May 15 04:10:20 ws1 ntpd_initres[539]: config_timer: 0->2 May 15 04:10:21 ws1 kernel: EXT2-fs error (device rd(48,1)): ext2_readdir: bad entry in directory #9473424: inode out of bounds - offset=36, inode=3681095072, rec_len=12, name_len=3 May 15 04:10:36 ws1 kernel: EXT2-fs error (device rd(48,1)): ext2_readdir: bad entry in directory #9932201: inode out of bounds - offset=36, inode=3679194537, rec_len=12, name_len=3 May 15 04:10:44 ws1 kernel: EXT2-fs error (device rd(48,1)): ext2_readdir: bad entry in directory #3935659: inode out of bounds - offset=36, inode=3681389995, rec_len=12, name_len=3

dmesg: === serial number disabled. OK. CPU0: Intel Pentium III (Katmai) stepping 03 Total of 2 processors activated (2182.35 BogoMIPS). enabling symmetric IO mode... ...done. ENABLING IO-APIC IRQs init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-9, 2-10, 2-11, 2-16, 2-17, 2-18, 2-20, 2-22, 2-23 not connected. number of MP IRQ sources: 18. number of IO-APIC #2 registers: 24. testing the IO APIC.......................

IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 .... register #01: 00170011 ....... : max redirection entries: 0017 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 000 00 0 0 0 0 0 1 1 59 (Deleted lines)... IRQ to pin mappings: IRQ0 -> 2 IRQ1 -> 1 (Deleted lines)... checking TSC synchronization across CPUs: passed. PCI: PCI BIOS revision 2.10 entry at 0xfdab0 PCI: Using configuration type 1 PCI: Probing PCI hardware PCI->APIC IRQ transform: (B0,I12,P0) -> 19 PCI->APIC IRQ transform: (B0,I12,P0) -> 19 PCI->APIC IRQ transform: (B0,I14,P0) -> 21 PCI->APIC IRQ transform: (B0,I18,P3) -> 21 PCI->APIC IRQ transform: (B3,I5,P0) -> 21 Linux NET4.0 for Linux 2.2 Based upon Swansea University Computer Society NET3.039 NET4: Unix domain sockets 1.0 for Linux NET4.0. NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP TCP: Hash tables configured (ehash 524288 bhash 65536) Starting kswapd v 1.5 Detected PS/2 Mouse Port. Serial driver version 4.27 with no serial options enabled ttyS00 at 0x03f8 (irq = 4) is a 16550A ttyS01 at 0x02f8 (irq = 3) is a 16550A pty: 256 Unix98 ptys configured RAM disk driver initialized: 16 RAM disks of 4096K size loop: registered device at major 7 PIIX4: IDE controller on PCI bus 00 dev 91 PIIX4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0x2860-0x2867, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0x2868-0x286f, BIOS settings: hdc:pio, hdd:pio hda: CRN-8241B, ATAPI CDROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 DAC960: ***** DAC960 RAID Driver Version 2.2.5 of 23 January 2000 ***** DAC960: Copyright 1998-2000 by Leonard N. Zubkoff <lnz@dandelion.com> DAC960#0: Configuring Mylex DAC960PTL1 PCI RAID Controller DAC960#0: Firmware Version: 4.07-0-29, Channels: 1, Memory Size: 8MB DAC960#0: PCI Bus: 3, Device: 5, Function: 1, I/O Address: Unassigned DAC960#0: PCI Address: 0xFC000000 mapped at 0xFC800000, IRQ Channel: 21 DAC960#0: Controller Queue Depth: 124, Maximum Blocks per Command: 128 DAC960#0: Driver Queue Depth: 123, Maximum Scatter/Gather Segments: 33 DAC960#0: Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 DAC960#0: Physical Devices: DAC960#0: 0:0 Vendor: SEAGATE Model: ST318275LW Revision: 0001 DAC960#0: Serial Number: 3AK0HNAQ000070215QK9 DAC960#0: Disk Status: Standby, 35565568 blocks (Deleted lines)... eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html eepro100.c: $Revision: 1.20.2.3 $ 2000/03/02 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others eth0: Intel PCI EtherExpress Pro100 at 0xfc807000, 00:A0:C9:AC:A7:F5, IRQ 21. Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). Receiver lock-up workaround activated. eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html eepro100.c: $Revision: 1.20.2.3 $ 2000/03/02 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others Partition check: sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 > rd/c0d0: rd/c0d0p1 rd/c0d0p2 VFS: Mounted root (ext2 filesystem) readonly. Freeing unused kernel memory: 52k freed Adding Swap: 1052216k swap-space (priority -1) cat uses obsolete /proc/pci interface Uhhuh. NMI received. Dazed and confused, but trying to continue You probably have a hardware problem with your RAM chips

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:23 EST