Lockup 2.1.6* => kmalloc/slab ???

Frank van de Pol (frank@obelix.fvdpol.inter.nl.net)
Thu, 6 Nov 1997 23:15:58 +0100 (MET)


Hi,

I've been playing with my 'occasional lockup problem' (see my posting of 5
nov 1997). I still get these lockups, both accesing my web server over the
ethernet and from localhost.

After killing an running X server (big processes!) at some VC using the SAK
key, I can get my machine back alive.

I tried to figure out what/why the 'Total failed network buffer allocs'
figure was sky-rocketing:

This figure is incremented in alloc_skb() (in net/core/skbuf.c) when the
kmalloc fails. I put some debug code around it, and at the times it locks up
I see that kmalloc returns NULL when allocating blocks of 4020 or 3332
bytes (priority is 3 is GFP_KERNEL).

The failing kmalloc() apparently goes through the list of memory caches,
trying to find one that is big enough for the requested size. In my case
this is a 4096 byte cache.

Then I lost track. In the __kmem_cache_alloc() function it should allocate a
element, or increase the size of the 4096 byte cache if it is full. I get
NO message from failures from this routine, as it should when returning a
NULL...

Of course I can be completely wrong in my 'guessing'; I admit I don't
understand the inner workings of the slab allocator (and lot of other parts
within the kernel)...

Regards,

Frank.

Additional info follows:

=== before machine locked up ===

/proc/slabinfo:

slabinfo - version: 1.0 (statistics)
kmem_cache 22 31 1 1 1 22 22 1 0 0
tcp_open_request 0 0 0 0 0 1 26 4 4 0
sock 85 95 19 19 19 100 382 22 3 0
filp 214 252 6 6 6 214 214 6 0 0
buffer_head 736 840 20 20 20 2400 25332 63 43 0
mm_struct 46 62 2 2 2 46 397 2 0 0
vm_area_struct 626 630 10 10 10 629 22273 10 0 0
files_cache 49 49 7 7 56 49 399 7 0 0
uid_cache 5 127 1 1 1 5 5 1 0 0
size-131072 0 0 0 0 0 0 0 0 0 0
size-65536 0 0 0 0 0 0 0 0 0 0
size-32768 1 1 1 1 8 1 1 1 0 0
size-16384 0 0 0 0 0 0 0 0 0 0
size-8192 7 8 4 4 16 7 8 4 0 0
size-4096 117 120 30 30 120 250 11096 686 656 0
size-2048 21 24 3 3 12 47 1425 49 46 0
size-1024 105 112 14 14 28 105 1685 28 14 0
size-512 10 16 2 2 2 22 9780 3 1 0
size-256 72 84 6 6 6 90 3771 35 29 0
size-128 134 150 6 6 6 135 494 6 0 0
size-64 849 882 21 21 21 856 3563 21 0 0
size-32 998 1071 17 17 17 1031 5031 19 2 0
slab_cache 54 126 2 2 2 86 771 4 2 0

Nov 6 22:29:39 obelix kernel: SysRq: Show Memory
Nov 6 22:29:39 obelix kernel: Mem-info:
Nov 6 22:29:39 obelix kernel: Free pages: 504kB
Nov 6 22:29:39 obelix kernel: ( 10*4kB 10*8kB 4*16kB 2*32kB 0*64kB 2*128kB = 504kB)
Nov 6 22:29:39 obelix kernel: Swap cache: add 0/0, delete 156368/0, find 41/0
Nov 6 22:29:39 obelix kernel: Free swap: 130748kB
Nov 6 22:29:39 obelix kernel: 8192 pages of RAM
Nov 6 22:29:39 obelix kernel: 392 free pages
Nov 6 22:29:39 obelix kernel: 458 reserved pages
Nov 6 22:29:39 obelix kernel: 6843 pages shared
Nov 6 22:29:39 obelix kernel: Buffer memory: 840kB
Nov 6 22:29:39 obelix kernel: Buffer heads: 876
Nov 6 22:29:39 obelix kernel: Buffer blocks: 840
Nov 6 22:29:39 obelix kernel: CLEAN: 98 buffers, 37 used (last=37), 0 locked, 0 protected, 0 dirty
Nov 6 22:29:39 obelix kernel: LOCKED: 390 buffers, 21 used (last=21), 0 locked, 0 protected, 0 dirty
Nov 6 22:29:39 obelix kernel: DIRTY: 321 buffers, 7 used (last=321), 0 locked, 0 protected, 321 dirty
Nov 6 22:29:39 obelix kernel: Networking buffers in use : 255
Nov 6 22:29:39 obelix kernel: Total network buffer allocations : 58383
Nov 6 22:29:39 obelix kernel: Total failed network buffer allocs : 0

=== During lockup ===

Nov 6 22:34:55 obelix kernel: free
sibling
Nov 6 22:34:55 obelix kernel: task PC stack pid father child younger older
Nov 6 22:34:55 obelix kernel: init 1 S FFFFFC18 0 1 0 347
Nov 6 22:34:55 obelix kernel: kflushd 2 S FFFFFC18 0 2 1 3
Nov 6 22:34:55 obelix kernel: kswapd 3 R 0000000C 0 3 1 9 2
Nov 6 22:34:55 obelix kernel: update 8 R FFFFFC18 0 9 1 227 3
Nov 6 22:34:55 obelix kernel: crond 6 R FFFFFC18 0 227 1 242 9
Nov 6 22:34:55 obelix kernel: klogd 9 R 00000027 0 242 1 244 227
Nov 6 22:34:55 obelix kernel: rpc.portmap 5 S 00000011 0 244 1 246 242
Nov 6 22:34:55 obelix kernel: inetd 10 S 0000000F 0 246 1 248 244
Nov 6 22:34:55 obelix kernel: named 11 S 00000002 0 248 1 250 246
Nov 6 22:34:55 obelix kernel: rwhod 12 S FFFFFC18 0 250 1 252 248
Nov 6 22:34:55 obelix kernel: lpd 13 S FFFFFC18 0 252 1 258 262 250
Nov 6 22:34:55 obelix kernel: lpd 17 R FFFFFC18 0 258 252
Nov 6 22:34:55 obelix kernel: rpc.nfsd 19 R FFFFFC18 0 262 1 265 252
Nov 6 22:34:55 obelix kernel: rpc.pcnfsd 7 S FFFFFC18 0 265 1 269 262
Nov 6 22:34:55 obelix kernel: rpc.lwpnfsd 15 S FFFFFC18 0 269 1 272 265
Nov 6 22:34:55 obelix kernel: rpc.mountd 18 R FFFFFC18 0 272 1 273 269
Nov 6 22:34:55 obelix kernel: httpd 14 R 00000002 0 273 1 390 279 272
Nov 6 22:34:55 obelix kernel: httpd -16 R current 0 275 273 276
Nov 6 22:34:55 obelix kernel: httpd 21 S 00000025 0 276 273 388 275
Nov 6 22:34:55 obelix kernel: sendmail 22 S FFFFFC18 0 279 1 304 273
Nov 6 22:34:55 obelix kernel: innd 24 S FFFFFC18 0 304 1 327 306 279
Nov 6 22:34:55 obelix kernel: RunCache 23 S 00000005 0 306 1 328 317 304
Nov 6 22:34:55 obelix kernel: login 4 S 00000005 0 317 1 334 318 306
Nov 6 22:34:55 obelix kernel: login 26 S 00000005 0 318 1 352 319 317
Nov 6 22:34:55 obelix kernel: login 25 S 00000006 0 319 1 360 320 318
Nov 6 22:34:55 obelix kernel: agetty 28 S 00000000 0 320 1 321 319
Nov 6 22:34:55 obelix kernel: agetty 29 S FFFFFC18 0 321 1 322 320
Nov 6 22:34:55 obelix kernel: agetty 30 S FFFFFC18 0 322 1 323 321
Nov 6 22:34:55 obelix kernel: vgetty 31 S 00000015 0 323 1 326 322
Nov 6 22:34:55 obelix kernel: gpm 34 S FFFFFC18 0 326 1 332 323
Nov 6 22:34:55 obelix kernel: overchan 35 S FFFFFC18 0 327 304
Nov 6 22:34:55 obelix kernel: squid 32 R FFFFFC18 0 328 306 331
Nov 6 22:34:55 obelix kernel: dnsserver 33 S 00000000 0 329 328 330
Nov 6 22:34:55 obelix kernel: dnsserver 27 S FFFFFC18 0 330 328 331 329
Nov 6 22:34:55 obelix kernel: ftpget 36 S FFFFFC18 0 331 328 330
Nov 6 22:34:55 obelix kernel: xdm 37 S 00000002 0 332 1 347 326
Nov 6 22:34:55 obelix kernel: bash 20 R 0000000F 0 334 317
Nov 6 22:34:55 obelix kernel: syslogd 38 R 00000003 0 347 1 332
Nov 6 22:34:55 obelix kernel: bash 39 R 00000013 0 352 318
Nov 6 22:34:55 obelix kernel: bash 40 S 00000008 0 360 319 367
Nov 6 22:34:55 obelix kernel: xinit 42 S 00000014 0 367 360 371
Nov 6 22:34:55 obelix kernel: X 43 S FFFFFC18 0 368 367 371
Nov 6 22:34:55 obelix kernel: olvwm 41 S 00000020 0 371 367 410 368
Nov 6 22:34:55 obelix kernel: olwmslave 44 S FFFFFC18 0 375 371 408
Nov 6 22:34:55 obelix kernel: httpd 47 S 00000016 0 388 273 389 276
Nov 6 22:34:55 obelix kernel: httpd 46 S 00000024 0 389 273 390 388
Nov 6 22:34:55 obelix kernel: httpd 48 S 00000020 0 390 273 389
Nov 6 22:34:55 obelix kernel: xterm 49 S 0000001F 0 408 371 412 409 375
Nov 6 22:34:55 obelix kernel: xterm 50 S 0000001F 0 409 371 411 410 408
Nov 6 22:34:55 obelix kernel: netscape 51 R FFFFFC18 0 410 371 409
Nov 6 22:34:55 obelix kernel: bash 52 S 0000001C 0 411 409
Nov 6 22:34:55 obelix kernel: bash 53 S 0000001B 0 412 408
Nov 6 22:34:55 obelix kernel: SysRq: Show Memory
Nov 6 22:34:55 obelix kernel: Mem-info:
Nov 6 22:34:55 obelix kernel: Free pages: 292kB
Nov 6 22:34:55 obelix kernel: ( 45*4kB 14*8kB 0*16kB 0*32kB 0*64kB 0*128kB = 292kB)
Nov 6 22:34:55 obelix kernel: Swap cache: add 0/0, delete 189757/0, find 1490/0
Nov 6 22:34:55 obelix kernel: Free swap: 129556kB
Nov 6 22:34:55 obelix kernel: 8192 pages of RAM
Nov 6 22:34:55 obelix kernel: 285 free pages
Nov 6 22:34:55 obelix kernel: 458 reserved pages
Nov 6 22:34:55 obelix kernel: 5469 pages shared
Nov 6 22:34:55 obelix kernel: Buffer memory: 160kB
Nov 6 22:34:55 obelix kernel: Buffer heads: 265
Nov 6 22:34:55 obelix kernel: Buffer blocks: 160
Nov 6 22:34:55 obelix kernel: CLEAN: 84 buffers, 37 used (last=81), 0 locked, 0 protected, 0 dirty
Nov 6 22:34:55 obelix kernel: LOCKED: 44 buffers, 28 used (last=31), 0 locked, 0 protected, 0 dirty
Nov 6 22:34:55 obelix kernel: DIRTY: 8 buffers, 0 used (last=0), 0 locked, 0 protected, 8 dirty
Nov 6 22:34:55 obelix kernel: Networking buffers in use : 129
Nov 6 22:34:55 obelix kernel: Total network buffer allocations : 64892
Nov 6 22:34:55 obelix kernel: Total failed network buffer allocs : 4003654
Nov 6 22:34:55 obelix kernel: IP fragment buffer size : 0
Nov 6 22:34:55 obelix kernel: SysRq: SAK

No other kernel messages crawling over the screen, only an occasional
kprintf from my own debugging code to alloc_skb():

Nov 6 22:34:55 obelix kernel: kmalloc fails in alloc_skb() in skbuf.c; size=3332, priority=3

=== after recovering from lockup ===

obelix:~$ more /tmp/slabinfo.3
slabinfo - version: 1.0 (statistics)
kmem_cache 22 31 1 1 1 22 22 1 0 0
tcp_open_request 0 0 0 0 0 1 36 5 5 0
sock 67 95 15 19 19 100 429 24 5 0
filp 323 336 8 8 8 323 323 8 0 0
buffer_head 304 924 22 22 22 2920 36752 114 92 0
mm_struct 39 62 2 2 2 54 432 2 0 0
vm_area_struct 450 756 8 12 12 815 26160 14 2 0
files_cache 42 56 7 8 64 57 434 9 1 0
uid_cache 4 127 1 1 1 5 5 1 0 0
size-131072 0 0 0 0 0 0 0 0 0 0
size-65536 0 0 0 0 0 1 1 1 1 0
size-32768 1 1 1 1 8 1 1 1 0 0
size-16384 0 0 0 0 0 0 0 0 0 0
size-8192 7 8 4 4 16 8 9 4 0 0
size-4096 0 72 0 18 72 250 16565 817 799 0
size-2048 3 8 1 1 4 47 2122 65 64 0
size-1024 44 88 7 11 22 111 3496 38 27 0
size-512 10 24 2 3 3 22 17909 54 51 0
size-256 34 168 3 12 12 167 9503 58 46 0
size-128 133 150 6 6 6 135 530 6 0 0
size-64 950 1008 24 24 24 977 3916 24 0 0
size-32 1065 1134 18 18 18 1107 5636 20 2 0
slab_cache 38 63 1 1 1 86 980 4 3 0
obelix:~$

System info:
Kernel: 2.1.61
Machine: Intel P60, 32MB ram
ah2940 scsi disk

========================---------------->
#define NAME "Frank van de Pol"
#define ADDRESS "mgr. Nelislaan 10"
#define CITY "4741 AB Hoeven"
#define COUNTRY "The Netherlands"
#define EMAIL "F.K.W.van.de.Pol@inter.NL.net

Linux - Why use Windows, since there is a door?