Re: 2.2.0 wishlist

Stephen C. Tweedie (sct@dcs.ed.ac.uk)
Fri, 14 Jun 1996 22:09:20 +0100


Hi,

On Fri, 14 Jun 1996 19:19:09 +0200 (MET DST), Marek Michalkiewicz
<marekm@i17linuxb.ists.pwr.wroc.pl> said:

> Stephen Tweedie:
>> Look again. This is implemented in 2.0. See linux/drivers/char/random.c,
>> function secure_tcp_sequence_number().

> Yes, there is secure_tcp_sequence_number(), but it is not used
> anywhere,

Oops, my mistake...

> and the tcp code still uses do_gettimeofday(). I guess it was too
> late before 2.0 to change that... Too bad, it would be another
> argument for NOT making the random driver optional :-).

But bear in mind that IP itself is still optional. :)

>> Impossible. DMA *needs* contiguous memory, and how you allocate

> I know. But perhaps it would be possible to change the way memory
> is allocated so that it tries to minimize fragmentation.

It already does. We organise memory as an exponential buddy heap, and
always allocate already-fragmented pages in preference to breaking up
contiguously-aligned free space.

> I am not an expert here, so please tell me if it totally doesn't
> make sense, but here are a few ideas (unless it is already done this
> way):

> - when allocating a single non-DMA free page, try to find one which
> has the highest physical address, to keep as much free contiguous
> low memory as possible

That is certainly possible to look at. We don't do this yet.

> - when allocating a larger non-DMA area (consisting of several
> pages), allocate each page as described above, then modify the
> page tables so that they appear contiguous in the virtual address
> space (even if they are not physically contiguous)

We already try to do this where possible. The kernel function
vmalloc() does precisely this, and it picks up single isolated free
pages in preference to breaking blocks of free space, helping to
reduce fragmentation. Trouble is, very little code in the kernel ever
needs larger non-DMA pages, and the primary place where we do need it
is when managing NFS packets with a block size >=4k. Networking is a
special case --- it needs to allocate memory at interrupt time, so we
can't muck around too much with page tables for these packets.

We talked about this in Berlin, and there are number of things we can
do to help in the short term, but the long term solution will actually
be to avoid doing large allocations at all, eliminating another copy
from the data path.

> - __get_free_pages() currently quits calling try_to_free_page()
> as soon as (nr_free_pages > reserved_pages) even though this does
> not mean that there are enough free contiguous pages. The code
> from linux/mm/page_alloc.c (lines 193-203) looks like this:

> ...
> I haven't tried that yet, but how about something like this instead:

> [code deleted]

> This way (except for GFP_BUFFER and GFP_ATOMIC), if the first
> attempt fails, we keep calling try_to_free_page() until it fails and
> breaks the loop. Or would it cause a lockup under some
> circumstances?

It's not the right way to do it. We want to keep get_free_pages() as
fast and lean as possible, since it is timing-critical. If we want to
do guarantees for memory allocation, then this should go into a
wrapper function which calls get_free_pages() and does some other
processing if that returns a failure. That way we'd be able to deal
separately with the two distinct cases, one where we just want to
recover from the out-of-memory error and return an error to the
application, and the other when we want to retry.

Just continuously calling try_to_free_page() is not necessarily a good
idea, since it won't guarantee to free memory on the correct
boundaries and you may end up having to swap out a LOT of data before
you succeed. Bear in mind also that modern SCSI devices do
scatter-gather, and so don't necessarily need a contiguous buffer.
For those drivers that do, the current mechanism which requires dma
reservation on driver registration is probably the cleanest solution.

> The problem is not limited to DMA - I got some failures with ncpfs
> directory cache allocations, which I believe are caused by the above
> discussed __get_free_pages() behaviour.

If ncpfs is doing large-area memory allocations using get_free_pages,
then the problem is probably right there --- you should be doing
vmalloc() instead.

Cheers,
Stephen.

--
Stephen Tweedie <sct@dcs.ed.ac.uk>
Department of Computer Science, Edinburgh University, Scotland.