Re: Oops and painful death of box, possibly solved

Simon Kirby (sim@netnation.com)
Wed, 31 Mar 1999 20:06:47 -0800 (PST)


What kernel version, what compiler, and what binutils (ld -v) were used?
0x0000000d looks either like odd memory corruption or a broken compile.
Did the two machines that were screwing up get the same OOPSes exactly?
Was MTRR enabled in the config? Were all servers running the same kernel?

I doubt disabling of the IDE would affect the dentry cache in any way.
You can boot the kernel with ide0=noprobe ide1=noprobe to stop it from
touching IDE (or take it out of the kernel).

How's it going, btw? ;)

Simon-

| Simon Kirby | Systems Administration |
| mailto:sim@netnation.com | NetNation Communications |
| http://www.netnation.com/ | Tech: (604) 684-6892 |

---
On Wed, 31 Mar 1999, Rick Franchuk wrote:

> Recently, I had a contractor of mine install a five Intel boxes (PII-400s and > PII-450s) in a provider in San Jose. Although all the pieces in all the > machines were identical, two started producing the following oops under what > appeared to be moderate to heavy disk usage: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000b > current->tss.cr3 = 012c7000, pr3 = 012c7000 > *pde = 00000000 > Oops: 0000 > CPU: 0 > EIP: 0010:[<c012d075>] > EFLAGS: 00010292 > eax: 00001960 ebx: fffffff3 ecx: 49913b2c edx: 49913fb4 > esi: c020d394 edi: 00000001 ebp: 0000000b esp: c54c5f38 > ds: 0018 es: 0018 ss: 0018 > Process httpd (pid: 17592, process nr: 58, stackpage=c54c5000) > Stack: 00000001 c2355c00 c020d394 c301301d 874e0363 0000000e c01288b4 c2355c00 > c54c5f80 c54c5f80 c0128ae0 c2355c00 c54c5f80 c3013000 c3013000 00000001 > bffffbd0 c3013000 c301301d 0000000e 874e0363 c0128bc5 c3013000 00000000 > Call Trace: [<c01288b4>] [<c0128ae0>] [<c0128bc5>] [<c0126caf>] [<c0107a40>] > Code: 8b 6d 00 8b 74 24 18 39 73 48 75 eb 8b 74 24 24 39 73 0c 75 > > >>EIP: c012d075 <d_lookup+65/dc> > Trace: c01288b4 <cached_lookup+10/4c> > Trace: c0128ae0 <lookup_dentry+fc/1b8> > Trace: c0128bc5 <__namei+29/5c> > Trace: c0126caf <sys_newstat+13/64> > Trace: c0107a40 <system_call+34/38> > Code: c012d075 <d_lookup+65/dc> 00000000 <_EIP>: <=== > Code: c012d075 <d_lookup+65/dc> 0: 8b 6d 00 movl 0x0(%ebp),%ebp <=== > Code: c012d078 <d_lookup+68/dc> 3: 8b 74 24 18 movl 0x18(%esp,1),%esi > Code: c012d07c <d_lookup+6c/dc> 7: 39 73 48 cmpl %esi,0x48(%ebx) > Code: c012d07f <d_lookup+6f/dc> a: 75 eb jne c012d06c <d_lookup+5c/dc> > Code: c012d081 <d_lookup+71/dc> c: 8b 74 24 24 movl 0x24(%esp,1),%esi > Code: c012d085 <d_lookup+75/dc> 10: 39 73 0c cmpl %esi,0xc(%ebx) > Code: c012d088 <d_lookup+78/dc> 13: 75 00 jne c012d08a <d_lookup+7a/dc> > > A numer of oopses would happen in rapid succession, followed by segfaults of > whatever happened to be running and 'cannot fork()' messages streaming down > the screen locally (I never saw them though... I'm in vancouver, so I can't > detail exactly what was on the screen if it wasn't logged). > > Curiously, the machine also exhibited the following during boot up (Which was > annoying, because the 'timeouts' involved were fairly long): > > hda: no response (status = 0xa1), resetting drive > hda: no response (status = 0xa1) > hdb: no response (status = 0xa1), resetting drive > hdb: no response (status = 0xa1) > hdc: no response (status = 0xa1), resetting drive > hdc: no response (status = 0xa1) > hdd: no response (status = 0xa1), resetting drive > hdd: no response (status = 0xa1) > > I have a feeling that this is significant, as once I was able to get our man > in Cali to completely disable all onboard IDE controllers (we run 100% SCSI > using Adaptec 2940UWs, but the OOPSen flared up when on an NCR53c875 we > decided to test), the oops now SEEM to have totally dissolved... I'm writing > in hopes that it could be confirmed that this is indeed the source of the > error (to let me sleep sounder at night) and if it's a specific board-related > issue I can find out the model number so you all can avoid it. ;) > > -- > __________________________________________ > | | > | Rick Franchuk - TranSpecT Consulting | > |_______ _______| > \mailto:rickf@transpect.net/ > \_____ICQ_#_4435025______/ > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.rutgers.edu > Please read the FAQ at http://www.tux.org/lkml/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/