Re: Page aging broken in 2.6

From: William Lee Irwin III
Date: Sun Dec 28 2003 - 11:38:05 EST


At some point in the past, I wrote:
>> Part of this is unrealistic; paging I/O being congested must be due to
>> paging itself causing seeks without additional I/O load. Reading a
>> single page once and then faulting that one page back into numerous
>> process address spaces is only one I/O request, and so cannot seek in
>> and of itself. So in this scenario, a convoy of processes on a single
>> page is plausible; aggravated paging I/O seekiness is not. Did you have
>> in mind some additional I/O load? Or do affected processes actually all
>> fault before the one I/O completes, and so all block temporarily?

On Sun, Dec 28, 2003 at 12:23:40PM +0100, Roger Luethi wrote:
> My previous message was meant as a warning of the assumption that
> the aggregated reference frequency is all that matters. I was merely
> pointing out how the number of processes referencing a page could affect
> performance as well. Reference frequency is used as an estimator for
> the _likelihood_ of a fault in the future, but the potential _impact_
> of a fault grows with the number of processes that may block on it.
> It is one possible (though not necessarily the most likely) explanation
> for the symptoms I see with 2.6.

I guess caution against LFU is uncontroversial.


On Sun, Dec 28, 2003 at 12:23:40PM +0100, Roger Luethi wrote:
> vmstat finds all processes blocked a lot more often in 2.6 than in
> 2.4, often for several seconds in a row. That only means something in
> comparison, of course, because it is anything but a precise measurement
> -- not only because of the 1 second snapshot granularity but also due
> to the fact that bookkeeping of running and blocked processes in the
> kernel is not accurate (processes may count as both blocked and running).
> Typical log snippet for a kernel build under some 2.6.0-test release:

I'm not convinced what vmstat gets out of 2.4 is entirely comparable to
what it gets out of 2.6. "blocked" and "running" are collected very
differently in 2.6. iowait shouldn't be collected on 2.4 at all.

This could probably be addressed by backporting 2.6's reporting methods
to 2.4 so the two kernels use similar reporting mechanisms.


On Sun, Dec 28, 2003 at 12:23:40PM +0100, Roger Luethi wrote:
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 9 3 6268 851814 1500 8992 440 0 996 348 1141 294 87 13 0 0
> 9 3 6164 852816 1540 9088 352 0 456 0 1045 145 91 9 0 0
> 9 6 6164 4044 853818 8112 60 0 100 28 1016 71 92 8 0 0
> 4 6 6604 854820 924 7432 532 472 784 488 1096 626 57 43 0 0
> 2 9 9248 3556 855921 6968 1044 2748 1640 2752 1283 412 74 13 0 13
> 3 7 9248 857071 924 6864 1208 0 1720 108 1326 524 60 34 0 6
> 10 8 11164 2080 858438 5952 1068 1944 2040 2064 1623 1655 74 26 0 0
> 0 11 13000 859563 356 5824 796 2032 1572 2036 1330 656 66 24 0 10
> 0 10 16608 4064 861037 5868 832 3960 1836 3964 1755 725 42 9 0 49
> 0 11 16604 862284 420 5920 1420 0 2216 4 1471 485 39 4 0 57
> 7 4 9772 10656 863286 6644 552 0 1344 12 1112 250 56 5 0 39
> 9 2 8228 864687 732 6960 296 0 632 108 1484 257 96 4 0 0
> 8 3 8212 10656 865689 7176 80 0 320 0 1050 146 95 5 0 0
> The trace above is not for the benchmark I referred to as kbuild in the
> past few weeks (it was taken under lighter load). Even so 2.6 exhibits
> significantly more periods with I/O wait and consequently takes longer
> than 2.4 to complete.

The oscillation in "free" and "buff" is very unusual. What is this
box doing?


On Sat, 27 Dec 2003 15:55:38 -0800, William Lee Irwin III wrote:
>> Can you capture sysrq t while a situation like this is in progress?

On Sun, Dec 28, 2003 at 12:23:40PM +0100, Roger Luethi wrote:
> What are you getting at? This may be easier for you to do because you
> know what you are looking for.

I'm not looking for anything per se. It will say what codepaths tasks
are blocked in and give an idea of what's going on around the system.
kgdb can do something similar that might be more useful, since data can
be examined also.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/