2.0.36 + SMP + General Protection Fault

Trevor Hilaiel (trevor@portland.netmind.com)
Thu, 24 Jun 1999 20:52:45 -0700


I apologize ahead of time if I just need to RTFM, all I ask is that
someone point me in the right direction:

I'm running into general protection faults and I would appreciate some
help in diagnosing the problem. I'm running 2.0.36 on two machines: a
Dell Poweredge and a VA Research, both SMP. The trigger is a
multithreaded application that does tons of network IO (hundreds of
simultaneous connections by different threads). I'm getting the
following syslog messages (five or six in a couple seconds):

Jun 23 16:43:58 katmandu kernel: general protection: 0000
Jun 23 16:43:58 katmandu kernel: CPU: 1
Jun 23 16:43:58 katmandu kernel: EIP: 0010:[eth_type_trans+33/164]
Jun 23 16:43:58 katmandu kernel: EFLAGS: 00010202
Jun 23 16:43:58 katmandu kernel: eax: f000b20e ebx: 00000002 ecx: 00000000 edx: 00000090
Jun 23 16:43:58 katmandu kernel: esi: 012e17f8 edi: 00000002 ebp: 00000000 esp: 0c79ee44
Jun 23 16:43:58 katmandu kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Jun 23 16:43:58 katmandu kernel: Process m2 (pid: 9766, process nr: 261, stackpage=0c79e000)
Jun 23 16:43:58 katmandu kernel: Stack: 0013ec08 0013585e 00000000 012e17f8 00000002 00000000 00000013 012e17f8
Jun 23 16:43:58 katmandu kernel: 00000000 00000000 00000000 00135a99 00000002 00000000 012e17f8 00002710
Jun 23 16:43:58 katmandu kernel: 00000000 00000100 b65fc914 00000001 0d347000 00000001 0d347000 00135dee
Jun 23 16:43:58 katmandu kernel: Call Trace: [eth_type_trans+8/164] [sys_msgctl+546/1188] [sys_msgctl+1117/1188] [newary+330/360] [unix_accept+62/436] [die_if_kernel+390/680]
Jun 23 16:43:58 katmandu kernel: Code: 8b 40 24 85 c0 74 0c 51 53 52 ff d0 83 c4 0c 5b c3 89 f6 31

At one point the VA Reseach machine appeared to be leaking process
descriptors, to the point where I maxed out with only seventy processes
running. Similar for file descriptors. To get the machine back into a
sane state I had to reboot.

My questions are:

1. Where can I go for information on deciphering this data to either
solve the problem myself (or at least understand the cause), or
provide smarter people with enough information to be able to quickly
and painlessly understand what is going on?

This may sound naive, but I read oops-tracing.txt but the document
didn't seem to be distributed with 2.0.36 so I wanted to insure that I
had the right information for diagnosing 2.0.

Thanks in advance for any advice.

-- 
Trevor Hilaiel

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/