Re: 2.4 MM overview?

From: Philipp Rumpf (prumpf@parcelfarce.linux.theplanet.co.uk)
Date: Mon Oct 16 2000 - 07:42:27 EST

Next message: Stefano Mason: "Re: Oops 2.2.x"
Previous message: Gábor Lénárt: "Re: [Criticism] On the discussion about C++ modules"
In reply to: Kenn Humborg: "RE: 2.4 MM overview?"
Next in thread: Kenn Humborg: "RE: 2.4 MM overview?"
Reply: Kenn Humborg: "RE: 2.4 MM overview?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Oct 16, 2000 at 12:54:33PM +0100, Kenn Humborg wrote:
> > > We've kind of got 1.5-level page tables. There are actually 3
> > page tables.
> > > The system page table maps memory starting at 0x80000000. The
> > P0 process
> > > page table maps from 0x0 up and the P1 process page table maps from
> > > 0x7fffffff down.
> >
> > And they have to be physically contiguous I guess ?
>
> The system page table must be physically contiguous. The process tables
> are actually referred to via virtual addresses, so they only have to
> be virtually contiguous in system space.

Oh. That sounds a lot easier then.

> > > This means that sparse address spaces are going to be _really_ expensive
> > > on PTEs. I don't know how much of a problem this is going to be yet,
> > > but I'm sure it's going to be fun :-)
> >
> > 512 byte pages, 4 bytes per pte ? Ouch. Can you fill the TLB manually ?
>
> That's not the worst! Considering the 4-byte PTE and the 40-byte mem_map_t,
> our memory management overhead is at least 44 bytes/page or 8.5%!

use a logical page size of 4kb.

> We are formulating cunning plans of aggregating 2, 4 or 8 pages together
> into "bigpages", telling the arch-independent code that we've got
> larger pages than we really have and manipulating multiple PTEs in the
> set_pte() primitive and friends.
>
> We don't know how feasible this is yet..

why wouldn't it be feasible ?

> > OTOH, I think mapping all physical memory makes sense with the three page
> > table setup.
>
> It might and it might not. Expanding the system page table is pretty
> much out of the question because it needs to be physically contiguous.

agreed.

> So we need to allocate system PTEs for the following at boot time:
>
> 1. Map all physical memory pages
> 2. Spare PTEs for mapping I/O space via ioremap().
> 3. Spare PTEs for vmalloc()
4. Spare PTEs for making user process page tables virtually contiguous. Note
that this effectively gives you a two-level page table. (Actually, a 3-level
page table, with 2 pmds per pgd, 4K PTEs per 3rd-level page table, and 512
bytes per page.)

So, here's what I'm proposing:

Your pgd is 2x4 bytes, and is in kernel virtual memory; the words are
pointers to (the mappings of) 4K PTEs in your system page table. The
PTEs are used to map your user space table virtually.

So the size of your system page table would need to be:
physical memory / 512 * 4 +
total vmalloced memory / 512 * 4 +
total ioremapped memory / 512 * 4 +
NR_TASKS * 16K

Once you're up and running you could play tricks to have normal values for
NR_TASKS by moving the pmds in and out of the system page table as required
(or perhaps just reuse the system page table memory you aren't using for
normal kernel memory a la GFP_DMA).

> It seems a bit wasteful that process pages will have two PTEs, one in
> the relevant process page table and one in the system page table.

why ? You lose 0.78 % of your physical memory compared to the more
complicated design, which shouldn't hurt too much. It might make sense
if you have tons of physical memory though so you can use all of it
(where tons I'd guess to be about 1.8 GB, not knowing too much about
the architecture).

> If we could get away without needing the system PTE, then this would
> either provide more space for #2 and #3 above, or reduce the size
> of the system page table.

> How much space tends to be vmalloc()-ed in a running system?

See the discussion for alpha a week or so ago. It tends to not be very much
but for some applications (TUX, for example), it's expected to be most of
physical memory.

Philipp Rumpf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Stefano Mason: "Re: Oops 2.2.x"
Previous message: Gábor Lénárt: "Re: [Criticism] On the discussion about C++ modules"
In reply to: Kenn Humborg: "RE: 2.4 MM overview?"
Next in thread: Kenn Humborg: "RE: 2.4 MM overview?"
Reply: Kenn Humborg: "RE: 2.4 MM overview?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Oct 23 2000 - 21:00:09 EST