Re: Scheduler activations (IIRC) question

From: Mike Galbraith
Date: Sun Aug 17 2003 - 03:34:34 EST


At 07:55 AM 8/17/2003 +0100, Jamie Lokier wrote:
Mike Galbraith wrote:
> >The point of the mechanism is to submit system calls in an
> >asynchronous fashion, after all. A proper task scheduling is
> >inappropriate when all we'd like to do is initiate the syscall and
> >continue processing, just as if it were an async I/O request.
>
> Ok, so you'd want a class where you could register an "exception handler"
> prior to submitting a system call, and any subsequent schedule would be
> treated as an exception? (they'd have to be nestable exceptions too
> right?... <imagines stack explosions> egad:)

Well, apart from not resembling exceptions, and no they don't nest :)

(ok, I only misunderstood _almost_ everything:)

You may be wondering what happens when I do five stat() calls, all of
which should be asynchronous (topical: to get the best out of the
elevator).

Nested? Not quite. At each stat() call that blocks for I/O, its
shadow task becomes active; that creates its own shadow task (pulling
a kernel task from userspace's cache of them), then continues to
perform the next item of work, which is the next stat().

The result is five kernel threads, each blocked on I/O inside a stat()
call, exactly as desired. A sixth kernel thread, the only one running
of my program, is continuing the work of the program.

Oh. You just want to dispatch N syscalls from one entry to the kernel?

Soon, each of the I/O bound threads unblocks, returns to userspace,
stores its result, queues the next work of this state machine, adds
this kernel task to userspace's cache, and goes to sleep.

As you can see, this achieves asynchronous system calls which are too
complex for aio(*), best use of the I/O elevator, and 100% CPU
utilisation doing useful calculations.

Other user/kernel scheduler couplings are possible, but what I'm
describing doesn't ask for much(**). Just the right behaviour from
the kernel's scheduling heuristic: namely, waker not preempted by
wakee. Seems to be the way it's going anyway.

If that's all you need, a SCHED_NOPREEMPT (synchronous wakeups) class should do the trick. I thought you wanted a huge truckload more than that.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/