Memory Rusting Effect [re: Linux hostile to poverty]

Andrew Derrick Balsa (andrebalsa@altern.org)
Sat, 18 Jul 1998 13:55:10 +0200


Hello Bill, kernel hackers,

I would like to comment on your post because although it is outside my domain
of competence (very limited: cpuid problems and tsc behaviour), I think you
have put your finger on a larger development issue.

First the new console code problems with monochrome adapters in 2.1.109. This I
believe will be fixed as soon as one of the code authors sits down in front of
a monochrome adapter equipped machine and really takes a try at it. I think
this is part of the normal tuning/debugging cycle, and a large and complex
feature such as the new console code takes a few kernel revisions to get
adequately debugged.

The "Memory Rusting Effect" however is something completely different. People
have been detecting memory fragmentation problems in the 2.1.x kernels for some
time now. Those problems don't exist on 2.0.x kernels.

However, many developers believe that adding more layers of code to tackle
special cases will help solve the problem. I believe this is the wrong approach.
I have even seen somebody proposing AI techniques (neural networks?) to solve
this particular problem. I'd say, plug a Cray in your Linux box just to solve
the memory fragmentation heuristics, then...

Adding code to tackle the special cases increases code complexity, gains nothing
in terms of performance for the general case, and ultimately leads to statements
of the kind: "If you only have 8Mb of RAM on your Linux box, better stick to
2.0.x kernels". :-(

Your post is a landmark in this debate because for the first time (as far as I
can tell) somebody has posted a good set of quantitative data describing the
problem, with a simple methodology that just about anybody can use to test the
problem on his/her Linux box.

The most interesting point is that we can see that from 2.1.100 -> 2.1.109, the
problem got _worse_, as people tried to solve it by adding yet more code...

A few (many) posts ago Colin Plumb posted a very clear analysis of the problem
in general terms: free memory blocks are not being given enough time to
recombine into larger ones in the present code. His analysis implies that adding
more heuristics to take care of special cases will only lead to even shorter
periods being given for free memory blocks to recombine, hence will _worsen_
the problem instead of lessening it. That fits with your measurements.

Colin has proposed a _very_ simple, neat first step to minimize the problem,
but I guess it wasn't fancy enough to catch the interest of other developers. I
mean, who cares for simplicity?

Summarizing, the point is that the _simpler_, _shorter_, _easier_ to understand
code in 2.0.x kernels is behaving better - for the general case - than the more
complex, longer, and harder to understand (obfuscated?) code in 2.1.x. That is a
measurable fact, and no amount of debate will make it go away.

I hope somebody will realize that it's time to stop thinking of ways to
"improve" the swap code.

Perhaps even going back to the code in 2.0.x would be in order.

OK, I have said it.

Bye,
---------------------
Andrew D. Balsa
andrebalsa@altern.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html