Re: aio-core why not using SuS? [Re: [rfc] aio-core for 2.5.29 (Re: async-io API registration for 2.5.29)]

From: Linus Torvalds (
Date: Fri Aug 16 2002 - 23:46:09 EST

On Fri, 16 Aug 2002, Martin J. Bligh wrote:
> 1. We have a per-process UKVA (user-kernel virtual address space),

What is your definition of a "process"?

Linux doesn't really have any such thing. Linux threads share different
amounts of stuff, and a traditional process just happens to share nothing.
However, since they _can_ share more, it's damn hard to see what a
"per-process" mapping means.

> 2. A per task UKVA, that'd probably have to come out of something
> like the vmalloc space. I think Bill Irwin derived something like
> that from Dave's work, though I'm not sure it's complete & working
> as yet. Per task things like the kernel stack (minus the task_struct
> & waitqueues) could go in here.

And what is your definition of a "task"?

You seem to think that a task is one thread ("per task things like the
kernel stack"), ie a 1:1 mapping with a "struct task_stuct".

But if you have such a mapping, then you _cannot_ make a per-task VM
space, because many tasks will share the same VM. You cannot even do a
per-cpu mapping change (and rewrite the VM on thread switch), since the VM
is _shared_ across CPU's, and absolutely has to be in order to work with
CPU's that do TLB fill in hardware (eg x86).

The fact is, that in order to get the right TLB behaviour, the _only_
thing you can do is to have a "per-MM UKVA". It's not per thread, and it's
not per process. It's one per MM, which is _neither_.

And this is where the problems come in. Since it is per-MM (and thus
shared across CPU's) updates need to be SMP-safe. And since it is per-MM,
it means that _any_ data structure that might be shared across different
MM's are really really dangerous to put in this thing (think virtual
caches on some hardware).

And since it is per-MM, it means that anything that there can be multiple
of per MM (which is pretty much _every_ data structure in the kernel)
cannot go at a fixed address or anything like that, but needs to be
allocated within the per-MM area dynamically.

I suspect that you are used to the traditional UNIX "process" notion,
where a "process" has exactly one file table, and has exactly one set of
signals, one set of semaphores etc. In that setup it can be quite
convenient to map these into the VM address space at magical addresses.

You may also be used to per-CPU page tables or software TLB fill
situations, where different CPU's can have different TLB contents. That
can be used to have per-thread mappings. Again, that doesn't work on Linux
due to page table sharing and hw TLB fills.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Fri Aug 23 2002 - 22:00:13 EST