Nasty oops (plural)

Christopher Wiles (wileyc@ai.cs.fujitsu.co.jp)
Tue, 25 Feb 1997 10:10:33 +0900 (JST)


All,

After I upgraded my trusty 486/80 to a 486/120, I've been having random
segfaults when I attempt to execute a program.

I (finally) noticed that the segfault was actually generating an oops when
I started my semimonthly logfile cleaning. I ran each one through
ksymoops (gritting my teeth when ksymoops segfaulted as well), and noticed
a pattern.

Here's one of them (with kernel 2.1.26, though 2.0.27 did the same thing):

Unable to handle kernel paging request at virtual address 810c0056
current->tss.cr3 = 00fa7000, Dr3 = 00fa7000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c012d686>]
EFLAGS: 00010246
eax: c120d018 ebx: 40000000 ecx: c0860000 edx: 00000000
esi: 000000e1 edi: 00000008 ebp: c0f28018 esp: c101cd04
ds: 0018 es: 0018 ss: 0018
Process grep (pid: 2148, process nr: 45, stackpage=c101c000)
Stack: c01b17fa c101cdd4 c101ce70 c1768c88 c0133c6d c101ce70 c01c58b0 fffffff8
c101ce70 00000000 00000006 bffffd5f 00000003 0000001d 0001fd5f c120d810
c0866ba0 c0866ba0 c0866ba0 00000286 0000002b 08058000 c101cda4 40001c60
Call Trace: [<c0133c6d>] [<c012d8fb>] [<c012daf5>] [<c0109d42>] [<c010a5f8>]
Code: 01 74 09 56 e8 41 89 ff ff 83 c4 04 46 c1 eb 01 75 ec eb ca

Ksymoops replies:
Using `linux/System.map' to map addresses to symbols.

>>EIP: c012d686 <flush_old_exec+126/150>
Trace: c0133c6d <load_elf_binary+5ad/c00>
Trace: c012d8fb <search_binary_handler+2b/b0>
Trace: c012daf5 <do_execve+175/1d0>
Trace: c0109d42 <sys_execve+32/50>
Trace: c010a5f8 <system_call+38/3c>

Code: c012d686 <flush_old_exec+126/150> addl %esi,0x56(%ecx,%ecx,1)
Code: c012d68a <flush_old_exec+12a/150> call ffff894a <_EIP+ffff894a>
Code: c012d68f <flush_old_exec+12f/150> addl $0x4,%esp
Code: c012d692 <flush_old_exec+132/150> incl %esi
Code: c012d693 <flush_old_exec+133/150> shrl $0x1,%ebx
Code: c012d696 <flush_old_exec+136/150> jne fffffffe <_EIP+fffffffe>
Code: c012d698 <flush_old_exec+138/150> jmp ffffffde <_EIP+ffffffde>
Code: c012d69a <flush_old_exec+13a/150>

I stress that each oops is _the_ _same_. It seems to be more frequent
with heavy load (where heavy is defined as greater than one), but the
machine will go for a week without a problem -- then I'll get a whole
bunch of them.

I'd really like someone to tell me that it's bad RAM. I'd really not like
someone to tell me that the processer is hosed.

Kernel was compiled with gcc 2.7.2.2, binutils 2.7. All executables that
generate oops are linked against glibc-2.0.1.

-- Chris (wileyc@ai.cs.fujitsu.co.jp)