> At present, when out (or almost out) of pages, Linux asks different
> sub-systems to release a page (yes, mostly only a page at a time).
> A sub-system normally (and rightly) gives up a page which it hasn't used
> recently.
> Unfortunately, when searching for a page to move into the free memory
> pool no weight is given to whether the released page will improve the
> quality of the memory-pool (such as allowing coalescing).
99.99% of the time we want a single page. try_to_free_page calls
shrink_mmap which walks the page list in page order releasing
pages from the page cache and buffer cache. The tendency is therefore
to release sequences of pages which could be coalesced. If the
parameters controlling kswapd are set reasonably it should be kswapd
that is doing the real work rather than an allocation request.
> But they are not freed immediately. Instead, update the necessary pte's
> (and anon structs), place the pages on a reclaim list, and tell the
> paging out daemon to start.
That is more or less what we have. Pages are kept in the page cache
or buffer pool as long as possible even when unused. Kswapd keeps
the actual free lists topped up to minimise overhead on allocation
requests plus requests fall back to calling try_to_free_page
themselves if necessary.
> OK, I've simplified the above quite a bit. But do you get the idea?
> If you think you're seen the above before, you're correct. It's what
> some other UNIX OSes do, and for good reasons.
Linux is not entirely dissimilar but it is optimised more for the
"sufficient memory" case - there is no way to go from a page to
the page tables it is used in.
> and improves the performance of swap (when reading
> back, we read several pages back - locality of
> reference on a none page granularity).
Linux also does page ahead.
> iii) Identify pages that allow coalescing in the free
> memory pool (actually not efficient to do and maybe
> not worth while, but the ability is there for small
> memory systems).
I reclaim unused pages from the page cache when coalescing and it
seems worthwhile. It needs work to mutex things so we can do it
at interrupt time as well though.
> See my slab allocator on http://www.nextd.demon.co.uk/
> It only frees pages when asked to do by the VM sub-system, and avoids
> a lot of the coalescing problems. Plus it's L1 cache friendly, and
> allows a state to be maintained in free objects (structures). No doc
> available (sorry), walk the code.
Yes, I've been looking at it. There are some interesting ideas. But it
is relatively slow by my current standards :-). PCs have all too
little L1 cache, all but the oldest processors have enough pipelining
to be able to init a structure pretty quickly and conditional branches
are a killer - even with the Cyrix branch prediction. It may be
possible to optimise slab though...
(It still does the silly power-of-two sizing though, doesn't it?)
Mike
-- .----------------------------------------------------------------------. | Mike Jagdis | Internet: mailto:mike@roan.co.uk | | Roan Technology Ltd. | | | 54A Peach Street, Wokingham | Telephone: +44 118 989 0403 | | RG40 1XG, ENGLAND | Fax: +44 118 989 1195 | `----------------------------------------------------------------------'