Re: Can context switches be faster?
From: Andrew James Wade
Date: Thu Oct 12 2006 - 22:55:08 EST
On Thursday 12 October 2006 14:29, John Richard Moser wrote:
> How does a page table switch work? As I understand there are PTE chains
> which are pretty much linked lists the MMU follows; I can't imagine this
> being a harder problem than replacing the head.
Generally, the virtual memory mappings are stored as high-fanout trees
rather than linked lists. (ia64 supports a hash table based scheme,
but I don't know if Linux uses it.) But the bulk of the mapping
lookups will actually occur in a cache of the virtual memory mappings
called the translation lookaside buffer (TLB). It is from the TLB and
not the memory mapping trees that some of the performance problems
with address space switches originate.
The kernel can tolerate some small inconsistencies between the TLB
and the mapping tree (it can fix them in the page fault handler). But
for the most part the TLB must be kept consistent with the current
address space mappings for correct operation. Unfortunately, on some
architectures the only practical way of doing this is to flush the TLB
on address space switches. I do not know if the flush itself takes any
appreciable time, but each of the subsequent TLB cache misses will
necessitate walking the current mapping tree. Whether done by the MMU
or by the kernel (implementations vary), these walks in the aggregate
can be a performance issue.
On some architectures the L1 cache can also require attention from the
kernel on address space switches for correct operation. Even when the
L1 cache doesn't need flushing a change in address space will generally
be accompanied by a change of working set, leading to a period of high
cache misses for the L1/L2 caches.
Microbenchmarks can miss the cache miss costs associated with context
switches. But I believe the costs of cache thrashing and flushing are
the reason that the time-sharing granularity is so coarse in Linux,
rather than the time it takes the kernel to actually perform a context
switch. (The default time-slice is 100 ms.) Still, the cache miss costs
are workload-dependent, and the actual time the kernel takes to context
switch can be important as well.
Andrew Wade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/