sheduling, threads, and more

Chris Smith (cd_smith@ou.edu)
Thu, 10 Jun 1999 00:19:54 -0500


I just had two thoughts, one of which is rather boring, and one of which is
really crazy and probably dumb, but I'll say it anyway.

First, what if we created a process structure that contains all the shared
attributes of a process (vm mapping, etc.) and make it initially empty, and
then start moving things one by one into that process structure from the
current task_struct. This turns a huge change to support the normal
process/thread abstraction into a series of very small and incremental
changes, so that it's possible to feasibly do this without killing the
stability of the kernel. Advantages? Mostly that it's easier (maybe even
possible for the first time) to do proper POSIX behavior for signal
handling in threads.

So if I did this, what would be the chances of its acceptance into the 2.3
kernel? It would probably be done by adding a CLONE_PROCESS flag to the
clone() system call that means create a new task with the same process
attributes. All the other clone flags, if used without CLONE_PROCESS,
could still work as advertised, so there's no loss of compatibility.

Comments?

Okay, now to the second and far dumber idea. While thinking about the
problems with two-level schedulers, I came up with an idea that might
create some interesting results. It'd be hard to implement, so I wouldn't
expect us to try it for quite a while, but here's the concept. You can all
point out how dumb it is.

Kernel recognizes processes, which are collections of resources that own
one or more "scheduling units". (One or more so as to support SCS threads,
but you can assume one for the remainder of this discussion.) A scheduling
unit is what contends for processing time slices by the kernel. The
important difference between that and a task is that a scheduling unit can
ask for as many processors as it wants, and the kernel will assign it up to
that many processors simultaneously. The kernel stops at assigning time
slices to scheduling units; it knows nothing about differing stacks and
register contexts, etc. As far as the kernel cares, each processor has one
(rather small) stack that is mapped into all process's address space.

When a scheduling unit gets assigned a processor, it calls a scheduling
routine using that processor's kernel-assigned stack. This drops the
processor out of kernel mode. The scheduling routine finds a thread to
execute in the time slice and does the actual context switch in user space.
Then that time slice runs completely in user space. If the thread blocks
on a mutex, then another thread from the same process will be scheduled,
all without a system call. If no more threads are runnable from the
process, and only then, will the scheduler abort the time slice early and
schedule another process.

The stack per processor that's shared between all processes is indeed at
least a little strange, and at first looks like it creates security
problems. This shouldn't be an issue, though, as long as only scheduling
is done on that stack. The stack pointer can be reset before calling the
scheduling routine, so the previous activity on the stack ceases to matter;
the only worries would be if a process wrote sensitive data on the stack,
which should never happen because it's a very small stack just devoted to
running the scheduling code.

Finally, the process has to manage the requested processor count of its
scheduling unit, which is a little challenging. If it made a system call
every time the number of runnable threads changed, that'd be a problem. I
see three ways to avoid this:
* First, this is only an issue for blocking on a mutex, etc. A lot of
blocks are already system calls, and the kernel can handle them internally
at very little cost.
* Second, the kernel could return a value from the system call (say,
set_concurrency) that means "stop increasing your processor request; you're
already above the number of processors I have". This would reduce the need
to change the value beyond a certain maximum amount.
* Finally, when a thread blocks the value isn't immediately decreased; if
another (extra) processor gets allocated, use THAT to make the system call.
If the thread wakes up before an extra processor is scheduled, then there
was never a problem.

Any interesting thoughts on this? My biggest concern is the terrible
blurring of the line between user space and kernel space. Is there a
better way to handle this communication between user and kernel schedulers
without blurring that line beyind recognition? Or should I just forget
about this and try to get more sleep? :)

Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/