Re: Interesting analysis of linux kernel threading by IBM

From: Sean Hunter (sean@uncarved.co.uk)
Date: Sat Jan 22 2000 - 04:22:43 EST

Next message: Andre Hedrick: "Re: 2.2.10-14 i686 SMP: IDE RAID-5 array hangs on mount"
Previous message: Andre Hedrick: "Re: Booting Linux past the 8Gig boundary?"
In reply to: Davide Libenzi: "Re: Interesting analysis of linux kernel threading by IBM"
Next in thread: Davide Libenzi: "Re: Interesting analysis of linux kernel threading by IBM"
Reply: Davide Libenzi: "Re: Interesting analysis of linux kernel threading by IBM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jan 21, 2000 at 07:21:45PM +0100, Davide Libenzi wrote:
>
> Hi Ian,
>
> Friday, January 21, 2000 3:47 PM
> Ian Soboroff <ian@cs.umbc.edu> wrote :
> > in the vast majority of cases, i suspect it's easier and probably
> > better to redesign the app than redesign the scheduler. that said,
> > the improvements already done are quite good and needed.
>
> why You all speak about worse designed multithreaded apps.
> Every technology bad applied is worse !
> Suppose You have a rendering job to be done.
> It can be subdivided in a highly parallel system with distinct threads that
> can run together :
>
> 1) Viewing transformation
> 2) Triangulation
> 3) Scan conversion
> 4) Texturing
> 5) Illumination
> 6) Frame output
>
> just to keep it simple.
> Now can someone tell me why I would not split my job into threads ?

Err...because the performance will suck?

> You can state that I don't have benfits in uniprocessor systems.
> But I have in SMP, that is, IMHO, the future of computing technology.
> You can find the needs of parallelism starting from CPUs executions units up
> to
> complex software systems up to daily work organization.
> And if the OS is the bottleneck of parallelism we must try to improve it,
> not to avoid multithreading.

No, yet again the _app_ design you propose is the bottleneck.

> Even if modern CPU architectures like IA64 which leave to high tuned
> compilers
> the work of parallelize ( I don' know if this word is correct ;) ) the code,
> I think that
> the software designer can do even better, or at least improve, the compiler
> job.

The software designer can improve the design and architecture of the
system so that the system does not suck. Yea, even to the point that
it rocks.

A java guy contributed to this thread a while ago, going "Well, say
you want to take advantage of 128 processors. You'll want to have 128
threads right there". Err, not if you want good performance and not
if your task doesn't suit that level of parallelism. This is, to
paraphrase Alan Cox "My programming sucks, fix the language. Err...
my programming still sucks, fix the kernel"

You need to do the design by looking at nature of the task, rather
than enforcing a preconcieved C-S paradigm (threads, or <insert
buzzword here>) on it. Look at how your task is structured. In the
above, the six tasks you have listed are not possible in parallel.
For instance, frame output can't happen until the other tasks have all
completed. You can't do the ray-trace for the illumination until you
know what the texture's like (bumpy, soft, clear, reflective etc)
otherwise your illumination will be wrong. You can't map a texture
until you know what the thing you're mapping onto is structured like,
so you can't do the texture until the first three tasks are
completed. etc etc

As such, all the time you spend cloning off and synchronising those
threads is pure waste. You might as well chuck those cycles into the
bin. That's not the sort of performance compromise the high-perf guys
I know would want to live with.

Secondly, the threads in your example would spend most of their lives
blocked, waiting for the other threads to finish, which would not lead
to the long runqueues you postulate.

Now, what you _could_ probably do, is divide the image up into regions
or objects and do bits of the first four tasks you mentioned in
parallel over those regions, then do the illumination in one step
(this _could_ be seperated into parallell tasks, but you'd need to be
pretty sure this was a win) when they've all completed (because
otherwise you'd have problems with the boundaries of the regions), and
output the frame. These divisions would most likely be seperate
_processes_ rather than threads, so they can run on different
machines. This gets rid of cache/processor affinity issues, and
ensures that your boxes all have about load avg 1 (ie optimal work).
I belive this is similar to what some "rendering farms" do.

There is _no_ benefit to just blindly forking off loads of threads,
other than to spend most of your time synchronising them, scheduling
them etc. You'll totally trash your cache (even _if_ your task is
massively parallel), and whatever scheduler design you choose, you may
as well prompt the user for what task to schedule next for all the
difference it'll make compared to a decent app design.

Poor design leads to poor performance. Always has, always will.

Sean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andre Hedrick: "Re: 2.2.10-14 i686 SMP: IDE RAID-5 array hangs on mount"
Previous message: Andre Hedrick: "Re: Booting Linux past the 8Gig boundary?"
In reply to: Davide Libenzi: "Re: Interesting analysis of linux kernel threading by IBM"
Next in thread: Davide Libenzi: "Re: Interesting analysis of linux kernel threading by IBM"
Reply: Davide Libenzi: "Re: Interesting analysis of linux kernel threading by IBM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Jan 23 2000 - 21:00:27 EST