Early warning of MM/swap problems in 2.1.129 (LONG)

Neil Conway (nconway.list@ukaea.org.uk)
Thu, 19 Nov 1998 18:26:54 +0000


This is a multi-part message in MIME format.
--------------1E6F9D2C10F6E2E0C9844D1C
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

This report needs editing, and as such is for the VERY interested only.
I wanted to give people a heads-up before the w/e as I don't expect to
have time to take it much further before then.

The salient points are:
1) running large programs (> physical memory seems to be required)
frequently fails even when lots of swap is available and the lower
limits for pagecache and buffermem are set to 1%. This has been
happening since 2.1.108 as a very minimum.

2) while testing this, I got 2.1.129 to end up with 160megs of swap
"used" with nothing really running at all. ps axu showed nothing more
than a meg or two in size.

The system is a Dell 410, 512MB, 2x4GB Quantum on the 7890 and another
Quantum 4GB on a 7880. Running 2.1.108 for the first bit of testing,
2.1.129 for the second half. Swap is striped across both primary disks.
Setup is RedHat 5.1.

The LONG report follows...

Neil Conway
neil.conway@ukaea.org.uk
--------------1E6F9D2C10F6E2E0C9844D1C
Content-Type: text/plain; charset=us-ascii; name="newcheck.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="newcheck.txt"

report on new tests, 19/11/98

Still on old kernel so far:
fuslwa:~> uname -a
Linux fuslwa.culham.ukaea.org.uk 2.1.108 #15 SMP Tue Sep 15 08:54:32 BST 1998 i686 unknown

Have memory situation:
fuslwa:~> free
total used free shared buffers cached
Mem: 516732 192644 324088 10108 157288 10116
-/+ buffers/cache: 25240 491492
Swap: 522992 1464 521528

but even though (491492+521528) equals 989MB, I get:

fuslwa:~> /old/home/njconway/memstuff/mem1 880
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)

(By the way: the /proc/sys/vm settings are:
[njconway@fuslwa vm]# cat pagecache buffermem
1 30 75
1 10 30).

This is clearly faulty... Should not core dump.

Now need to flush the buffers, and then try it (presumably it'll work). Then
want to refill buffers and/or cache, and then ensure problem is reproducable.
Then perform same check with newer kernel (e.g. today's 2.1.129 kernel).

So:
fuslwa:~> free
total used free shared buffers cached
Mem: 516732 192652 324080 10108 157288 10116
-/+ buffers/cache: 25248 491484
Swap: 522992 1464 521528
fuslwa:~> /old/home/njconway/memstuff/mem1 480
Allocated the memory
Writing to the memory
Finished
fuslwa:~> free
total used free shared buffers cached
Mem: 516732 26984 489748 8504 5240 9884
-/+ buffers/cache: 11860 504872
Swap: 522992 1464 521528

This worked, and cleared out the buffers.

Now show the system works "nominally" now:
fuslwa:~> time /old/home/njconway/memstuff/mem1 480
Allocated the memory
Writing to the memory
Finished
0.570u 3.440s 0:04.01 100.0% 0+0k 0+0io 78pf+0w
fuslwa:~> time /old/home/njconway/memstuff/mem1 950
Allocated the memory
Writing to the memory
Finished
1.030u 68.420s 2:47.83 41.3% 0+0k 0+0io 116669pf+119270w

So the 950 meg case took a long time but worked.

Next, used hdparm a few times (got 12.8megs per second ish from the two
primary hard disks, which makes me wonder why the swapping was only occurring
at 3megs per second or so - check this later), and this filled up the buffers
again:
fuslwa:~> free
total used free shared buffers cached
Mem: 516732 162028 354704 4156 136248 4732
-/+ buffers/cache: 21048 495684
Swap: 522992 4132 518860

and now use mem1 again:
fuslwa:~> time /old/home/njconway/memstuff/mem1 950
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.010u 0.010s 0:05.87 0.3% 0+0k 0+0io 102pf+0w

Note that on this occasion (and some earlier too) it didn't core dump
immediately but took a little while. Sometimes it's immediate. Strange,
that. In fact, right now even mem1 2000 doesn't fail right away. Weird.
Another bug?

Hmm, when load up not only buffers but cache with lots of stuff, suddenly I
find that the program core dumps immediately:
fuslwa:/home/njconway> free
total used free shared buffers cached
Mem: 516732 275516 241216 4432 137656 116624
-/+ buffers/cache: 21236 495496
Swap: 522992 4072 518920
fuslwa:/home/njconway> time /old/home/njconway/memstuff/mem1 950
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.000u 0.000s 0:00.04 0.0% 0+0k 0+0io 99pf+0w

This is somewhat puzzling and deserves further investigation or consultation
with the guys on the list.

OK, anyway, have now verified that problem is easily reproducible.

Time to use a newer kernel, and see what has been fixed since 108.

OK, 17:23 19/11, now booted with 2.1.129.
Defaults in /proc/sys/vm are:
[root@fuslwa vm]# cat buffermem pagecache
5 25 60
5 30 75

With mem situation of:
fuslwa:~> free
total used free shared buffers cached
Mem: 516400 125736 390664 10752 56708 57976
-/+ buffers/cache: 11052 505348
Swap: 522992 0 522992

will now try mem1...:
fuslwa:~> time /home/njconway/memstuff/mem1 2000
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.000u 0.010s 0:06.28 0.1% 0+0k 0+0io 75pf+0w
fuslwa:~> time /home/njconway/memstuff/mem1 950
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.000u 0.010s 0:06.20 0.1% 0+0k 0+0io 75pf+0w

Hmm. More tries...
fuslwa:~> free
total used free shared buffers cached
Mem: 516400 127516 388884 14016 56772 58168
-/+ buffers/cache: 12576 503824
Swap: 522992 0 522992

fuslwa:~> time /home/njconway/memstuff/mem1 900
Allocated the memory
Writing to the memory
Finished
1.060u 38.530s 1:21.69 48.4% 0+0k 0+0io 13646pf+114692w

Aha. This worked. In fact, it took about half as long as before, even
allowing for only 900 megs vs. 950 before.

Straight afterwards, mem1 950 fails again with 6 second delay, even though free
gave:
[root@fuslwa vm]# free
total used free shared buffers cached
Mem: 516400 49884 466516 3380 26212 16472
-/+ buffers/cache: 7200 509200
Swap: 522992 14744 508248

which meant roughly 950 was available without even cutting back the cache or
buffers (which it wouldn't do of course as these are set to 5% apiece).

OK, now set them to 1% apiece, and retry mem1 950:
[root@fuslwa vm]# echo "1 25 60" > buffermem
[root@fuslwa vm]# echo "1 30 75" > pagecache

fuslwa:~> free
total used free shared buffers cached
Mem: 516400 50288 466112 3692 26212 16824
-/+ buffers/cache: 7252 509148
Swap: 522992 14700 508292
fuslwa:~> time /home/njconway/memstuff/mem1 950
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.000u 0.010s 0:06.08 0.1% 0+0k 0+0io 89pf+0w

=> bug is still with us.

Now try to produce it with less huge request for memory: get cache and buffers
well filled...
fuslwa:~> free
total used free shared buffers cached
Mem: 516400 513244 3156 3824 175280 323516
-/+ buffers/cache: 14448 501952
Swap: 522992 2900 520092
fuslwa:~> time /home/njconway/memstuff/mem1 950
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.000u 0.000s 0:06.35 0.0% 0+0k 0+0io 80pf+0w
fuslwa:~> free
total used free shared buffers cached
Mem: 516400 513244 3156 3824 175280 323624
-/+ buffers/cache: 14340 502060
Swap: 522992 2900 520092

So, this time, the program didn't crash right away, and it didn't free ANY
memory. Hard to see the pattern here.

Now use the smaller memory request:

fuslwa:~> free
total used free shared buffers cached
Mem: 516400 513244 3156 3824 175280 323624
-/+ buffers/cache: 14340 502060
Swap: 522992 2900 520092
fuslwa:~> time /home/njconway/memstuff/mem1 600
Allocated the memory
Writing to the memory
Finished
0.780u 5.810s 0:15.97 41.2% 0+0k 0+0io 1902pf+31319w
fuslwa:~> free
total used free shared buffers cached
Mem: 516400 23688 492712 2236 5240 11788
-/+ buffers/cache: 6660 509740
Swap: 522992 13796 509196

So, this actually DID work, even though the 950meg case failed.

Hmm.

Use 850 next time, after refilling buffers/cache.
fuslwa:~> free
total used free shared buffers cached
Mem: 516400 513188 3212 3088 254340 232948
-/+ buffers/cache: 25900 490500
Swap: 522992 3296 519696
fuslwa:~> time /home/njconway/memstuff/mem1 850
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.000u 0.030s 0:06.13 0.4% 0+0k 0+0io 84pf+0w
fuslwa:~> free
total used free shared buffers cached
Mem: 516400 394088 122312 3096 254340 113844
-/+ buffers/cache: 25904 490496
Swap: 522992 3296 519696

The above shows a failure which for the first time has occurred even WITH
memory being ejected from the cache - don't recall ever seeing that before.

So try it again with 850 and see if it works this time:
fuslwa:~> free;time /home/njconway/memstuff/mem1 850;free
total used free shared buffers cached
Mem: 516400 394108 122292 3228 254340 113852
-/+ buffers/cache: 25916 490484
Swap: 522992 3284 519708
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.000u 0.000s 0:11.61 0.0% 0+0k 0+0io 82pf+0w
total used free shared buffers cached
Mem: 516400 394108 122292 3228 254340 113852
-/+ buffers/cache: 25916 490484
Swap: 522992 3284 519708

WEIRD - it took twice as long as usual to fail.

fuslwa:~> free ; time /home/njconway/memstuff/mem1 850 ; free
total used free shared buffers cached
Mem: 516400 394112 122288 3228 254340 113852
-/+ buffers/cache: 25920 490480
Swap: 522992 3284 519708
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.000u 0.050s 0:06.16 0.8% 0+0k 0+0io 82pf+0w
total used free shared buffers cached
Mem: 516400 387448 128952 3476 254404 107104
-/+ buffers/cache: 25940 490460
Swap: 522992 3260 519732

The above is back to the same duration, but again the cache has shrunk a bit.

ARGHGHGHGHGH: the duration of the failures is bogus I fear, since I was
accidentally running with my cwd on the NFS server fusrs3d. When I changed
to a local disk for it to dump the core file to, suddenly I got:

fuslwa:/home/njconway> free ; time /home/njconway/memstuff/mem1 850 ; free
total used free shared buffers cached
Mem: 516400 387600 128800 3564 254404 107160
-/+ buffers/cache: 26036 490364
Swap: 522992 3228 519764
Allocated the memory
Writing to the memory
Segmentation fault (core dumped)
0.000u 0.010s 0:00.02 50.0% 0+0k 0+0io 82pf+0w
total used free shared buffers cached
Mem: 516400 387732 128668 3564 254532 107164
-/+ buffers/cache: 26036 490364
Swap: 522992 3228 519764

So I can ignore duration when it comes to analysis (phew!).

Here we have a machine, that should be able to dump all but roughly 10megs
worth of buffers/cache, and is refusing an 850 meg program.
It should allow (see above) 490364+519764 - 10240 kB progs to run, which is
roughly 976 megs (999888kB).

This is bad.

Now look for threshold:
Failed at 800, worked at 750:
fuslwa:/home/njconway> free ; time /home/njconway/memstuff/mem1 750 ; free
total used free shared buffers cached
Mem: 516400 387932 128468 3564 254724 107164
-/+ buffers/cache: 26044 490356
Swap: 522992 3224 519768
Allocated the memory
Writing to the memory
Finished
0.810u 23.490s 1:03.76 38.1% 0+0k 0+0io 9363pf+105573w
total used free shared buffers cached
Mem: 516400 168752 347648 1540 5240 157392
-/+ buffers/cache: 6120 510280
Swap: 522992 160368 362624

but note that while this was going on I used vmstat to watch the swapping:
[njconway@fuslwa vm]# vmstat 1
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 1 0 126516 3080 5240 14544 0 105 180 35 80 42 1 2 97
1 0 0 157460 2152 5240 30532 40 30976 59 7744 1790 2673 1 10 90
1 1 1 173224 3064 5240 14640 24 15772 6 3943 897 1362 1 15 84
1 0 1 204184 1188 5240 31136 4 30964 8 7749 1527 2222 0 6 94
0 1 0 220524 1060 5240 17236 20 16356 9 4089 738 1054 1 14 85
1 1 0 235156 2936 5240 29300 4 14628 29 3657 691 997 0 5 95
1 1 0 378688 688 5240 144404 76 155956 152 38992 14339 17810 0 48 52
1 0 0 394660 1228 5240 157360 24 3684 59 921 458 604 0 60 40
0 1 0 396700 1092 5240 157332 32 2064 22 517 362 476 0 62 38
0 1 0 398688 1124 5240 157332 0 1996 8 499 380 499 0 66 33
1 0 0 400700 1180 5240 157332 0 2016 0 504 373 507 0 68 32
0 1 0 402728 1276 5240 157332 0 2024 0 506 382 509 0 70 30
0 1 0 404760 1320 5240 157332 0 2032 0 508 371 506 0 71 29
1 0 0 407200 1156 5240 157288 0 2432 0 609 463 583 1 70 29
0 1 0 409224 1084 5240 157292 0 2032 4 508 378 506 0 71 29
1 0 0 411264 1168 5240 157292 0 2036 0 509 380 507 0 71 29
1 0 0 413292 1280 5240 157292 0 2028 0 507 385 510 0 71 28
0 1 0 415312 1352 5240 157292 0 2004 0 501 370 505 1 70 29
1 0 0 417424 1136 5240 157288 0 2124 0 531 380 513 0 70 30
1 0 0 419680 1252 5240 157288 0 2252 0 563 390 509 0 71 29
1 0 0 422576 1396 5240 157288 0 2904 0 726 435 500 0 63 37
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 425808 1148 5240 157288 0 3228 0 807 427 522 0 60 40
0 0 0 160300 347680 5240 157360 60 820 171 206 255 261 0 29 71
0 0 0 160300 347676 5240 157364 0 0 4 0 121 11 0 0 99

Note the early high swapout rate, at which point vmstat was more or less
running once per second, followed by a HUGE delay during which a lot got
swapped out - vmstat stopped responding for ages here (well over 10 seconds)-
and after this the swapout rate falls to a pathetic level.

This all bothers me.

Uhoh, I just noticed that swap-used is still at 160000 blocks. This is also
v.v. bad, as nothing should be in use (bar maybe 5megs or so).

I get:
[njconway@fuslwa vm]# cat /proc/swaps
Filename Type Size Used Priority
/dev/sda5 partition 130748 1424 2
/dev/sda6 partition 130748 65036 1
/dev/sdb6 partition 130748 1496 2
/dev/sdb7 partition 130748 92168 1

--------------1E6F9D2C10F6E2E0C9844D1C--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/