Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

From: Ingo Molnar
Date: Sun Feb 25 2007 - 14:11:29 EST



* Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:

> Kevent is a _very_ small entity and there is _no_ cost of requeueing
> (well, there is list_add guarded by lock) - after it is done, process
> can start real work. With rescheduling there are _too_ many things to
> be done before we can start new work. [...]

actually, no. For example a wakeup too is fundamentally a list_add
guarded by a lock. Take a look at try_to_wake_up(). The rest you see
there is just extra frills that relate to things like 'load-balancing
the requests over multiple CPUs [which i'm sure kevent users would
request in the future too]'.

> [...] We have to change registers, change address space, various tlb
> bits and so on - we have to do it, since task describes very heavy
> entity - the whole process. [...]

but ... 'threadlets' are called thread-lets because they are not full
processes, they are threads. There's no TLB state in that case. There's
indeed register state associated with them, and currently there can
certainly be quite a bit of overhead in a context switch - but not in
register saving. We do user-space register saving not in the scheduler
but upon /every system call/. Fundamentally a kernel thread is just its
EIP/ESP [on x86, similar on other architectures] - which can be
saved/restored in near zero time. All the rest is something we added for
good /work queueing/ reasons - and those same extras should either be
eliminated if they turn out to be not so good reasons after all, or they
will be wanted for kevents too eventually, once it matures as a work
queueing solution.

> I think it is _too_ heavy to have such a monster structure like
> task(thread/process) and related overhead just to do an IO.

i think you are really, really mistaken if you believe that the fact
that whole tasks/threads or processes can be 'monster structures',
somehow has any relevance to scheduling/task-queueing performance and
scalability. It does not matter how large a task's address space is -
scheduling only relates to the minimal context that is in the CPU. And
most of that context we save upon /every system call entry/, and restore
it upon every system call return. If it's so expensive to manipulate,
why can the Linux kernel do a full system call in ~150 cycles? That's
cheaper than the access latency to a single DRAM page.

for the same reason has it no relevance that the full kevent-based
webserver is a 'monster structure' - still a single request's basic
queueing operation is cheap. The same is true to tasks/threads.

Really, you dont even have to know or assume anything about the
scheduler, just lets do some elementary math here:

the reqs/sec your sendfile+kevent based webserver can do is 7900 per
sec. Lets assume you will write further great kevent code which will
optimize it further and it goes up to 10,100 reqs per sec (100 usecs per
request), ok? Then also try how many reschedules/sec can your Athon64
3500 box do. My guess is: about a million per second (1 usec per
reschedule), perhaps a bit more.

Now lets assume that a threadlet based server would have to
context-switch for /every single/ request served. That's totally
over-estimating it, even with lots of slow clients, but lets assume it,
to judge the worst-case impact.

So if you had to schedule once per every request served, you'd have to
add 1 usec to your 100 usecs cost, making it 101 usecs. That would bring
your 10,100 requests per sec to 10,000 requests/sec, under a threadlet
model of operation. Put differently: it will cost you only 1% in
performance to schedule once for every request. Or lets assume the task
is totally cache-cold and you'd have to add 4 usecs for its scheduling -
that'd still only be 4%. So where is the fat?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/