'vectorized' (order-dependent) VM

MOLNAR Ingo (mingo@valerie.inf.elte.hu)
Wed, 15 Jul 1998 00:34:48 +0200 (MET DST)


On Tue, 14 Jul 1998, Linus Torvalds wrote:

> The only reason I wrote a program to do the calculations was that I was
> too lazy (and possibly too inept) to symbolically solve the value of
>
> x * (x-2) * (x-4) * .. (y times)
> -----------------------------
> x ^ y
>
> so I just wrote a program to iterate and do the calculations for me.
>
> Having 5% memory free almost guarantees that you'll have consecutive pages
> for an 8MB machine. It also showed very clearly that on a 4MB machine it
> was getting painful - you needed to keep a fair amount of your memory free
> to give the same kinds of guarantees.

i think the fundamental problem is that the (probability) model asumes
that we free pages 'randomly'. But this isnt a fundamental property, it's
just the property of the current Linux implementation. Random
single-free-page distribution means the distribution of free pages does
not depend on the actual 'usefullness' of those free pages very much.
Looking behind the surface, one can see that the fundamental problem is
that our 'free-pool goal' is a scalar, while it should be an order-indexed
vector.

this causes the highly nonlinear and unpleasant distribution of
higher-order free pages in the above model, since our 'end-user goal' is
very non-scalar. (eg. fork() needs 8k physically contiguous allocations,
etc.)

the straightforward way of solving this is to make the 'metrics' of how
much memory we really have dependent on the 'quality' of free blocks. We
already have such a metrics, it's free_memory_available(nr). But it's not
yet order-dependent. 'free memory metrics' is _not_ the same as 'the
number of pages available right now', it's something more complex, highly
nonlinear. This whole problem is remotely related to the 'goodness()'
function in sched.c, very much related btw.. To deal with 8k allocations,
we should make free_memory_available() dependent on the goal of the query
(eg. one more parameter, say 'free_memory_available(nr,order)'), and to
make the kswapd-wakeups and swapouts be aware of the 'vectorized' goal as
well. It's all pretty straightforward.

once the free-pool does not consist of randomly distributed free pages
(which might randomly cluster into 8k aligned 8k blocks or not), we get as
many free pages as specified by the _order-dependent_ goal-vector. It's
not anymore a simple scalar value of 'this much memory we need', but
rather a vector, each order has it's own scalar value, which has to be
reached. swap_out() and kswapd operates on this vector, not on the
'one-dimensional' free pages metrics.

the whole VM consists of 3 major components, a 'freeing backend', an
'allocation frontend' and 'freeness-metrics'. All of these components have
to follow the same concept. I've seen code before that tries to
'vectorize' certain components (mostly local hacks and done only where
it's easy), but we have the correct dynamics only if all 3 components
understand, manipulate and work towards the same 'vectorized goal'. The
easiest part is kswapd, occasionally we did some 'go a bit more if certain
order freelist is too low' stuff, but this again only fixes the metrics
(the symptom ;), not the freeing-engine and the allocation-engine. (both
should be aware of the vectorized nature of the goal, the freeing engine
should know what effect a given freed page has on the goal, and the
allocation engine(s) should properly balance the goal itself.)

with such a 'vectorized' distribution free pages are not anymore randomly
free pages, but implicitly depend on their physical address, so you cannot
anymore do the above multiplication of independent probabilities to get
the probability of a successful high-order allocation. If you do the
maths, you get much nicer values for low memory systems. (actually, we get
the numbers we _want_, with certain limitations, see below)

In this 'vectorized' model only 'pinned-down' unswappable kernel-used
pages are the ones that might prevent us reaching the goal. (even this
can be handled in a statistical way by never letting the number of
_possible_ N-order blocks go below a certain value, and this isnt the
freelist, but a collection of per-order 'free and possibly freeable'
pages. Again, this is a broad and generic balancing constraint in the
allocation frontend, not a random rule added somewhere)

> That's all I'm saying. Essentially, I've tried to convince people with raw
> numbers that the VM layer doesn't actually need any major overhaul, it
> only needs to get the slight fixes. People that have talked about major
> overhauls haven't shown me either code _or_ reasoning, so..

this isnt a major overhaul, it's just a straightforward extension, to go
from a slightly nonlinear scalar goal to a very nonlinear, order-indexed
vectorized goal when building the VM dynamics. It just happens to touch
90% of the VM code ;)

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html