Re: 2.2.1-ac3 Oops with many fds

Dan Christian (robodan@netscape.com)
Fri, 05 Feb 1999 12:41:23 -0800


I reproduced the oops over a serial port and caught everything. This time I
raised the system limit first, and things ran for over 20min under pounding
load. I had over 9000 fds in a single threaded process.

Things go down too fast for the oops to get into syslog, but they do make it out
the serial console.

The oops happens immediately after four "TLAN: Couldn't allocate memory for
receive data." messages. I'm going to try a system with different ethernet.

It dies again in TLan_HandleRxEOF.

-Dan

Here is what happed just before the oops (vmstat was running every 5 sec):

__procs ___________________memory __swap ______io __system _______cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 1152 3196 15684 32660 0 0 2 21 599 240 29 69 2
1 0 0 1152 3080 15696 32668 0 0 2 19 581 234 29 70 1
1 0 0 1152 3468 15460 32684 0 0 3 26 621 264 33 65 1
1 0 0 1152 3152 14704 32752 0 0 4 61 1295 717 45 52 3
3 4 0 1152 6008 12876 32900 0 0 11 262 3055 3741 71 29 0
2 1 0 1152 4680 13432 33116 0 0 19 184 3573 2911 70 30 0
2 50 1 1152 4000 10156 31568 0 0 15 214 4444 14218 66 34 0
TLAN: Couldn't allocate memory for received data.
TLAN: Couldn't allocate memory for received data.
TLAN: Couldn't allocate memory for received data.
TLAN: Couldn't allocate memory for received data.

Here is the new oops (via ksymoops):

Options used: -V (specified)
-O (specified)
-k /proc/ksyms (specified)
-l /proc/modules (default)
-M (specified)
-c 1 (default)

Unable to handle kernel NULL pointer dereference at virtual address 0000005c
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01945a2>]
EFLAGS: 00010046
eax: 00000000 ebx: 00000000 ecx: c0006800 edx: 0000004c
esi: c0006858 edi: c01e9cc4 ebp: c0004800 esp: c01efe7c
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=c01ef000)
Stack: c01e9cc4 c019000c c0009078 00000286 c0006858 c0009078 c0006800 00000000
00000000 00000001 c0193805 c01e9cc4 0000000c c7bc1b00 04000001 0000000b
c01eff1c 0000000c c010a781 0000000b c01e9cc4 c01eff1c c01dbe74 0000000b
Call Trace: [<c019000c>] [<c0193805>] [<c010a781>] [<c010a53f>] [<c010a877>] [<c
0109794>] [<c0100018>]
[<c0117b74>] [<c010a88d>] [<c0109794>] [<c0107f21>] [<c0106000>] [<c0107f
[<c010607b>] [<c0106000>] [<c0100176>]
Code: 39 53 5c 76 0f 89 53 5c 03 93 80 00 00 00 89 93 84 00 00 00

>>EIP: c01945a2 <unregister_netdev+1f06/8e14>
Trace: c019000c <secure_tcp_sequence_number+719c/90b0>
Trace: c0193805 <unregister_netdev+1169/8e14>
Trace: c010a781 <dump_thread+23b5/23ec>
Trace: c010a53f <dump_thread+2173/23ec>
Trace: c010a877 <enable_irq+77/128>
Code: c01945a2 <unregister_netdev+1f06/8e14> 00000000 <_EIP>:
Code: c01945a2 <unregister_netdev+1f06/8e14> 0: 39 53 5c cmpl %
edx,0x5c(%ebx)
Code: c01945a5 <unregister_netdev+1f09/8e14> 3: 76 0f jbe 1
4 <_EIP+0x14> c01945b6 <unregister_netdev+1f1a/8e14>
Code: c01945a7 <unregister_netdev+1f0b/8e14> 5: 89 53 5c movl %
edx,0x5c(%ebx)
Code: c01945aa <unregister_netdev+1f0e/8e14> 8: 03 93 80 00 00 addl 0
x80(%ebx),%edx
Code: c01945af <unregister_netdev+1f13/8e14> d: 00
Code: c01945b0 <unregister_netdev+1f14/8e14> e: 89 93 84 00 00 movl %
edx,0x84(%ebx)
Code: c01945b5 <unregister_netdev+1f19/8e14> 13: 00

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing

Alan Cox wrote:

> > I raise the system limit while things were running. This seems like
> > quite a bad idea in hindsite :-{. It did run for a many seconds past
>
> It shouldnt be
>
> > TLAN: Couldn't allocate memory for receive data.
>
> Do you know from your logs if the TLAN message occured about the same time
> as the others ?
>
> > EIP is in:
> > TLan_HandleRxEOF
>
> That looks like the thunderlan driver did something wrong in the out of memory
> situation rather than the big file handles.
>
> > There was an interesting syslog line many seconds before the oops:
> > Feb 4 14:48:04 tanker kernel: VFS: file-max limit 4096 reached
>
> You hit your global limit of 4096 different files

--
_______________________________________________________________________
Disraeli was pretty close: actually, there are Lies, Damn lies, Statistics,
Benchmarks, and Delivery dates.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/