Re: Linux and physical memory

Stephen C. Tweedie (sct@redhat.com)
Sat, 23 Jan 1999 19:37:23 GMT


Hi,

On Fri, 22 Jan 1999 00:11:09 -0800, "David S. Miller"
<davem@dm.cobaltmicro.com> said:

> This is where your whole scheme is dead wrong.

> There should be no tlb invalidation issues, everything should be local
> to the processor and the page tables being used by him. This is the
> whole crux behind my "make the mapping locally and temporarily only at
> the time of the access" scheme.

Danger! Danger!

We cannot rely on this working on an entirely per-CPU basis. Think
about read(2) and reading from the page cache: we have to kva-map the
page and then copy it to user space. The user space access can page
fault. We can be rescheduled on a different CPU.

If (as we'd prefer) the kva page tables are marked as persistant over
context switch (we can do that on P-Pro and later Intel CPUs), then we
have just found ourself able to schedule to another CPU without a
reliable protection on the TLBs.

We have to cater for the same kva being used on multiple CPUs, or else
just pin the process to that CPU while the kva is in use (yuck).

There are obvious ways around this, of course, such as marking the
number of kva-locked pages in each process's task struct, and doing an
invalidate) on the kva range every time we reschedule any process with
non-zero kva pages to a different CPU. That at least will avoid any
extra work in the normal straight-through non-faulting code path.

The other advantage of a common pool of kva addresses is that it should
allow us to have a "cache" of kvas: for example, all commonly used page
cache pages will already have a kva assigned, and we won't need to
perform _any_ tlb invalidation, local or remote, to access them.

> Really, how much does the following cost:

> __cli();
> *pte1 = make_pte(PAGE_KERNEL, MAGIC_VADDR1);
> *pte2 = make_pte(PAGE_KERNEL, MAGIC_VADDR1);
> touch_high_pages_in_some_way()...
> *pte1 = 0; flush_tlb_page(MAGIC_VADDR1);
> *pte2 = 0; flush_tlb_page(MAGIC_VADDR1);
> __sti();

Good question. As I've said, we cannot make this atomic because of the
possibility of taking a page fault in the middle of it, but otherwise
this is what we ideally want to aim for. Just how expensive is a
single-page local tlb flush on the various 32-bit architectures? If it
is cheap enough then using the kva space as a uniform, cross-CPU cache
to avoid tlb flushes isn't really workwhile. If it is expensive then
eliminating invalidations for access to commonly touched pages will be
worth the extra expense involved we need to pass modified kvas between
CPUs.

> 1) Local tlb flushes == 2 * X86_CYCLES_PER_TLB_PAGE_FLUSH

If it's that cheap then yes, do it per CPU and invalidate everything if
we detect one of the processes crossing between CPUs.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/