unexpected paging during large file reads in 2.1.127

David J. Fred (djf@ic.net)
Tue, 10 Nov 1998 12:39:51 -0500


Hi there,

Summary: When doing large file reads from disk the system pages
unexpectedly causing moderate to severe degradation in I/O
and overall system performance even though there is plenty of
memory.

Has anyone else noticed and/or characterized this specific behavior in
the 2.1 kernels? I found some messages in the archives where Linus
suggested a two line patch to mm/vmscan.c for what seemed to be a
related problem, but that failed to fix the problem for me.

I've been playing around without any instrumentation more
sophisticated than vmstat, but it's pretty clear something unwanted is
going on. It sort of looks like the problem crops up either when
cache is getting bigger than max_percent or buffers are falling below
min_percent. That's just a wildass guess (tm) based on the behavior
of the system after tweaking various things /proc/sys/vm/*.

Details:

For starters, my machine is a dual PII system (ASUS P2B-D) with 256M
and several SCSI disks. I'm running stock 2.1.127 with SMP enabled,
compiled with gcc 2.7.2.3.

All values in /proc/sys/vm are the defaults (see the very bottom of
the message for the actual values). I have tried tweaking various
parameters in pagecache, buffermem, swapctl etc. without complete
success. It seems that each time I would make a little progress in
one way the system would start to misbehave in another. I thought
that for this report I would reboot and leave the parameters at their
default values since that's possibly more meaningful as a general
case.

The vmstat below shows the symptoms. The machine's status was more or
less freshly booted (<1 hour) running X11, normal daemons, etc. The
system had been sitting quietly since boot. The vmstat is
semi-annotated, but basically I created a 500M file with:

dd if=/dev/zero of=blah bs=1024k count=500

I then proceeded to do four successive invocations of:

cat blah >/dev/null

allowing several seconds to elapse between the end of one and the
beginning of the next. The write went okay. During the first read
pass the system starts paging a little bit, with each subsequent read
producing more and more paging. The last two runs produced what I can
only characterize as thrashing. This is quite repeatable on my system.

Anyway, here's the vmstat:

(~) djf@avo% vmstat 3
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id

0 0 0 0 187248 11716 26116 0 0 0 2 108 57 2 2 96
0 0 0 0 187248 11716 26116 0 0 0 0 136 82 2 1 96
0 0 0 0 187248 11716 26116 0 0 0 0 107 85 3 2 95
0 0 0 0 187248 11716 26116 0 0 0 0 104 43 2 1 97

[Start of write of 500M to disk]

0 1 1 0 127528 66180 26116 0 0 0 8093 142 51 1 36 63
0 1 1 0 41952 144324 26116 0 0 0 14722 205 109 2 55 43
0 1 0 0 28752 156964 26064 0 0 1 13518 206 119 1 40 59
0 2 1 0 28752 156932 26060 0 0 5 12833 174 65 2 34 64
0 3 0 0 29528 156084 26060 0 0 11 11333 164 149 3 37 60
0 2 0 0 29524 156244 26064 0 0 12 11167 169 71 1 31 68
0 1 0 0 31608 154348 26064 0 0 2 10423 160 55 1 25 74
0 1 0 0 28556 157132 26064 0 0 2 12000 164 63 1 32 67
0 1 0 0 29468 156300 26064 0 0 2 12333 174 61 2 33 65
0 1 1 0 31168 154732 26064 0 0 2 13125 161 53 1 36 63
0 1 0 0 28592 157100 26064 0 0 2 10000 161 53 1 25 74
0 1 1 0 30832 154948 26064 0 0 2 11973 155 55 1 31 68
0 0 0 0 29256 156556 26064 0 0 1 9000 158 53 2 24 74
0 2 0 0 29200 156620 26064 0 0 1 8920 133 65 1 5 94
0 2 0 0 29200 156620 26064 0 0 0 11947 150 64 2 5 93

[Start of first 500M read]

1 0 0 0 10144 156684 44936 0 0 6347 0 243 241 1 5 94
0 1 0 0 3708 125388 85128 0 0 13450 0 369 437 1 14 85
1 0 0 0 6384 87048 123872 0 0 12955 2 363 425 1 17 82
1 0 0 0 3796 52048 164396 0 0 13572 0 384 440 1 17 82
1 0 0 8 11108 13100 199364 0 3 11792 1 343 404 2 24 74
0 1 0 360 21080 13100 189784 0 117 9175 30 301 337 1 26 73
1 0 0 360 18744 13100 192248 0 0 13542 0 375 435 1 13 85
2 0 0 360 17068 13164 193976 0 0 12578 7 361 410 1 13 86
0 1 0 360 14904 13112 196348 0 0 13106 0 375 427 1 16 82
0 1 0 396 14796 13100 196664 0 21 12235 6 353 410 1 13 85
0 1 0 3180 20792 13100 193476 0 919 6467 230 302 333 0 30 70
1 0 0 3180 23932 13112 190392 0 0 12514 0 421 533 1 15 84
1 0 0 3180 22716 13108 191624 0 0 12234 0 550 790 1 15 84
1 0 0 3180 22300 13156 192008 0 0 12549 0 469 615 2 15 84
0 0 0 3172 25552 13100 188884 3 0 5022 0 311 385 1 12 87
0 0 0 3164 25536 13100 188892 3 0 3 1 126 85 3 2 95
1 0 0 3120 24832 13228 189356 25 0 169 0 178 238 3 8 89
0 0 0 3120 24832 13228 189356 0 0 4 0 105 35 2 2 96

[Second read start]

0 0 0 3120 24812 13228 189368 0 0 4 9 110 126 12 5 83
1 0 0 3120 19732 13292 194376 1 0 4826 0 219 214 1 5 94
0 1 0 5388 19920 13100 196660 0 756 7861 189 282 292 1 25 73
0 1 0 6056 27620 13100 189624 0 223 11683 56 575 815 1 22 77
1 0 0 6056 21984 13100 195268 0 0 12851 2 582 874 1 15 84
1 0 0 6056 21904 13164 195288 0 0 13652 0 444 573 1 17 82
1 1 0 6056 23248 13100 193932 1 0 11954 0 1012 1780 1 18 81
0 2 0 6096 20616 13100 196540 0 13 9612 4 859 1485 1 14 85
1 0 0 8096 24668 13100 194652 5 668 5409 167 280 314 1 26 73
0 1 0 8144 31068 13164 188312 1 16 12013 12 379 452 1 16 83
1 0 0 8144 23660 13100 195820 0 0 12996 1 676 1065 1 16 83
1 0 0 8144 28572 13100 190912 0 0 12355 0 897 1537 2 17 81
1 0 0 8144 28468 13100 191016 0 0 12561 0 573 847 1 14 85
0 1 0 8704 23796 13100 196304 0 187 9889 47 479 680 1 21 77
1 0 0 10340 27352 13100 194384 7 549 7991 138 361 426 1 24 74
0 1 0 10332 33036 13100 188688 4 0 11799 0 646 1007 1 14 85
0 0 0 10312 27852 13100 193916 7 0 10040 0 511 732 1 18 81
0 0 0 10280 27640 13100 194096 13 0 61 1 279 114 2 3 95
0 1 0 10272 26732 13164 194520 33 0 115 0 306 144 1 2 97

[Third read start]

0 1 0 10272 26616 13100 194760 3 0 12356 1 635 946 2 16 82
0 2 0 10260 28068 13100 193280 15 0 8562 11 953 1174 2 11 87
0 1 0 10888 25288 13100 196696 0 212 10135 53 721 976 2 37 61
0 1 0 11712 27092 13100 195732 57 289 6051 75 327 288 1 18 80
0 1 0 11736 32716 13100 190424 9 13 11422 10 737 925 2 19 79
1 0 0 11736 31712 13164 191368 0 0 12353 7 906 1474 1 18 81
1 0 0 11736 30784 13100 192400 0 0 12663 6 579 820 1 14 84
1 0 0 11736 28436 13100 194740 0 0 13254 0 505 720 1 17 82
0 1 0 12856 27824 13100 196476 5 381 9300 96 668 994 1 26 73
3 2 0 15984 32228 13100 195128 20 1048 7254 262 371 477 1 19 79
1 1 0 15972 40072 13100 187140 5 0 6685 0 408 581 2 10 88
1 0 0 15964 34728 13100 192680 8 0 8280 4 332 409 1 11 88
0 1 0 15964 33684 13100 193720 0 0 12426 0 759 1232 2 17 82
1 1 0 15964 35032 13100 192376 0 0 11932 4 905 1542 1 16 83
0 1 0 15952 37404 13100 189988 8 0 11357 1 816 1375 2 15 83
0 2 0 25136 40700 13100 195888 40 3123 6063 781 380 398 1 19 80
0 4 1 26144 41816 13100 195800 243 600 1573 150 322 323 0 7 92
1 2 0 25984 45248 13100 192188 324 169 2088 43 343 388 0 10 89
0 3 0 25648 42260 13100 194844 196 0 4612 0 382 528 1 11 87
0 0 0 24996 40972 13100 195528 377 0 501 0 241 311 1 3 95
0 0 0 24988 40924 13100 195568 11 0 7 0 105 33 2 1 97
1 1 0 24732 40104 13100 196148 148 0 172 1 181 176 1 3 95
0 0 0 24668 40004 13100 196168 23 0 8 0 126 99 2 5 93

[Forth read start]

1 0 0 24636 46080 13100 190020 33 0 4655 0 300 384 3 7 90
0 1 0 24636 45476 13100 190628 0 0 13005 0 973 1677 1 16 83
1 0 0 24628 43240 13100 192844 5 0 11931 0 561 929 2 29 70
0 1 0 24628 41632 13164 194388 0 0 13315 2 422 548 1 14 86
0 3 0 24916 40104 13100 196292 31 119 9381 30 396 514 1 21 77
0 2 0 26064 40908 13100 196624 119 552 3171 138 397 339 0 9 90
0 5 0 26092 41064 13100 196504 309 227 1696 57 339 356 0 9 91
0 4 0 25736 44884 13100 192224 207 0 4351 0 341 459 1 7 91
0 2 0 25628 45868 13100 190944 56 0 4825 0 285 352 2 9 89
0 1 0 25600 45704 13100 191352 23 0 10640 0 609 960 1 15 83
0 1 0 25588 46852 13100 190192 8 0 11869 0 847 1440 1 17 81
0 1 0 25584 46480 13100 190560 0 0 12680 7 725 1175 2 18 80
0 1 0 25584 45016 13100 192024 0 1 12424 1 662 1048 2 16 82
1 4 0 26112 41368 13100 196224 168 327 3901 82 361 359 0 12 88
0 5 0 26116 41928 13100 195672 349 261 1497 65 354 380 0 8 92
0 3 0 25976 41016 13100 196428 272 167 2494 42 328 358 0 7 93
0 3 0 26040 41032 13100 196476 295 189 1674 47 326 367 0 10 89
0 3 0 25676 42196 13100 194940 204 0 5080 0 380 523 1 7 92
1 3 0 25556 41304 13100 195724 72 0 7133 0 463 678 1 13 86
0 2 0 25196 46492 13100 190164 181 0 3735 1 274 350 3 6 91
1 0 0 25148 40012 13100 196600 27 0 7859 0 604 660 2 11 87
0 1 0 25136 46324 13100 190264 0 0 12030 0 1056 1691 2 19 79
0 1 0 25132 40388 13100 196200 4 0 10857 1 895 1485 1 19 80
0 0 0 24644 48860 13100 187300 251 0 195 0 211 220 1 4 96

[End of last read]

0 0 0 24644 48860 13100 187300 0 0 0 0 101 30 2 2 96

It seems to me with 256M of memory and a working set size of ~70-80M I
shouldn't be seeing 24M swapped and thrashing from simply reading 2G
off disk. At the very least it's clear I'm not getting consistent
read performance on an otherwise unloaded system.

Here are the values from /proc/sys/vm:

(/proc/sys/vm) root@avo% tail *
==> bdflush <==
40 500 64 256 15 3000 500 1884 2
==> buffermem <==
5 25 60
==> freepages <==
256 512 768
==> kswapd <==
512 32 32
==> overcommit_memory <==
0
==> pagecache <==
5 30 75
==> pagetable_cache <==
25 50
==> swapctl <==
20 3 1 3 32 4 8192 8192

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/