RE: Large physical memory in 32 bit kernels

Frederick Staats (fstaats@ichips.intel.com)
Mon, 18 Jan 1999 14:12:54 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Philip Deacon: "Solaris to Linux Port..."
Previous message: David Woodhouse: "Re: xconfig"

Disclaimer: The opinions in this message are my own and don't
necessarily represent the opinions of my employer Intel
Corporation.

Kernel VM architectures are NOT my area of specialty which
is why I was asking other people for their opinions. Please feel
free to correct any factual errors in my message.

In linux that the segments you refer to are non-overlapping
in virtual memory but overlapping in physical
memory. The supervisor code can access ALL of physical
memory at one set of addresses and the user process can
address it's subset with a different set of addresses. These
addresses must fit into the same 32 bit pointers (because the
TLB cache is not flushed going between user and supervisor
code.) In other systems (FreeBSD for example) the kernel has
an intermediate (pmap) layer that manages what physical memory
the kernel has mapped into supervisor pages at any given time.
In FreeBSD the kernel can only access a small subset
of physical memory at any given time (e.g. I/O buffers in
user space, shared memory segments, text segments for code
being loaded into user space, kernel code, in memory
kernel data structures, I/O memory mapped devices... .)

What I'm looking for is a way in large memory machines
to build some (but not all) of this indirection into the linux
kernel. I would like to run a few large (1-3 GB) processes that
don't context switch very often. I think (???) I'm willing to
take a TLB cache flush hit for user context (only) switches of
my large processes because I'm going to swap something to/from
disk anyway. If you want to run these kinds of jobs today
64 bit servers can solve the problem. But if your problems
are memory/CPU bound, but not I/O bound then 64 bit servers
are overkill.

Some people may be interested in a very different problem of
running thousands (tens of thousands) of smaller active processes
at the same time with out swapping to disk.

> -----Original Message-----
> From: N. D. Culver [mailto:ndc@alum.mit.edu]
> Sent: Sunday, January 17, 1999 5:56 PM
> To: fstaats@ichips.intel.com
> Subject: Memory question
>
>
> Hi,
>
> I saw your post to the Linux kernel list and wonder if
> you could correct me if I'm wrong below:
>
> A task with paging enabled is defined to have 2 segments
> USER segment starting at virtual address 0 and SUPERVISOR
> segment starting at physical/virtual address 0. Both segments
> are defined to be 4G. Access to the supervisor segment is via
> call gates. USER code is loaded starting at 128K virtual and
> pages below 128K are mapped to low 128K physical which is the
> domain of the SUPERVISOR code (the pages are marked as S).
>
> When the SUPERVISOR code (CPL 0) runs it can disable the
> paging enabled (PE) bit and call some other physically located
> segment without incurring TLB flush/reload. Eventually when
> the SUPERVISOR segment is preparing to return to the USER
> segment it can re-enable the PE bit and do the return.
>
> IF I AM RIGHT THERE WILL BE NO TLB FLUSH/RELOAD when the
> SUPERVISOR returns because CR3 has not changed???

I think the key is that the supervisor and the user spaces
share the SAME 32 virtual address space (user will trap
when accessing pages marked supervisor.) In order to get
the supervisor and the kernel separate 32 bit spaces you
need to flush the ENTIRE TLB cache every time you switch
between the two modes. The flushing isn't to expensive
in of itself... the fact that every page lookup will be
a miss in the TLB after every system call is a killer!

> If the above is true then it should be possible to allow
> each task to have a full 4G-128K of virtual memory and still
> have swift response to system calls.

The supervisor can access the same physical memory with
different virtual addresses than an individual user process
will use. In linux the kernel has it's own separate mapping
to the entire physical memory in one continuous block.

> Thanks,
> Norm Culver

Disclaimer: The opinions in this message are my own and don't
necessarily represent the opinions of my employer Intel
Corporation.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Philip Deacon: "Solaris to Linux Port..."
Previous message: David Woodhouse: "Re: xconfig"