Re: Forking (fwd)

Clemens Huebner - Sun Germany Technical Solution Center - Munich (Clemens.Huebner@Germany.Sun.COM)
Mon, 16 Jun 1997 10:56:35 +0200


> From glamm@mountains.ee.umn.edu Fri Jun 13 19:59:30 1997
> From: Robert Glamm <glamm@mountains.ee.umn.edu>
> Subject: Re: Forking (fwd)
> To: linux-kernel@vger.rutgers.edu
> Date: Fri, 13 Jun 1997 11:06:04 -0500 (CDT)
> Mime-Version: 1.0
> Content-Transfer-Encoding: 7bit
>
>
> > > > > Is it possible to create a new process which shares the data with the
> > > > > parent?
> > > >
> > > > Yes, by specifying such an option to clone() (which fork() uses to
> > > > implement its behaviour, btw). That is not meant to be used in
> > > > applications, though. You should use a thread library that supports the
> > > > clone() system call (such as linuxthreads).
> > >
> > > So why is clone() not meant to be used in applications? If I want
> > > full control over how my program is parallelized across multiple processors,
> > > I'm going to want use clone(). If you claim that we shouldn't use clone()
> > > for "portability" reasons, that's a bunch of BS - if you want your
> > > application to all-out perform on SMP machines, you have to hand-tune
> > > it for each architecture, making portability pointless anyway.
> >
> > Because you get the same performance/control with a threads library, but you get a fully
> > portable ptogram that way.
>
> Wow, one would think that from the above comment the comment I made wasn't
> even read...

No reason to get personal here. I won't dispute anything you said below, but it's beside the
point. There is a function in the threads library which calls clone for you. So why do it by
hand? All the optimisations you talk about don't have ANYTHING to do with how you create your
thread.
I won't tell you to use any other stuff from the threads library, but there simply is no good
argument against using a thread library for creating and destroying threads.

Clemens
>
> NO, you don't get the same performance/control with a threads library.
> If anyone would like to argue this point with me I will happily hand-tune
> some SMP code for various architectures that will beat any
> threads-library coded software. Here's why:
>
> 1) if you're coding for SMP, you want to extract _AS MUCH PERFORMANCE
> AS POSSIBLE_ from the system you're running on. Otherwise, why
> bother to code for SMP at all? I'm talking apps here that really
> _need_ SMP performance, not some silly little Web server process.
> And if anyone gives me another RC5 cracking example, I will personally
> shoot them. Why? Because RC5 is so parallel it's trivial -- it's
> a simple matter to divide up the keyspace and distribute it across
> multiple machines without much overhead regardless of how that
> distribution is done. There are few applications that are this
> data parallel.
>
> 2) Now, given that you want to extract as much performance as possible
> from your SMP machine, you need to code it on a _machine by machine_
> basis if you want it to run on different platforms. Thus, the portability
> achieved by the threads library is pointless. Given an app coded
> using the threads library and one hand-tuned per machine across
> a bunch of different platforms the hand-tuned ones will win hands down.
> When I say `hand-tuned per platform' I mean taking _all_ of the
> machine's characteristics into account, either at compile-time or at
> run time: L1/L2 cache sizes, time per memory reference, average
> disk access time and transfer rates (if necessary), average semaphore
> contention time between processors, etc. Bottom line: if you tell
> me that your nice portable threads library can do better than
> my (assembly!) semaphore/lock code + hand-tuned SMP code, you're out
> of your mind for any reasonably complex SMP problem.
>
> Commercial applications developers must agree with my points, otherwise
> why would they spend so much time and effort hand-tuning their SMP code
> for so many different platforms? E.g.: Cray has an entire DIVISION
> devoted to helping app developers tune their packages to particular
> platforms. IBM (for their cluster & SMP machines) and SGI are both
> in similar situations.
>
> Of course, if you don't agree that you want to get as much performance
> as possible from your SMP machine, then the above really doesn't apply.
> But then why go through all the trouble of using SMP in the first place
> if you don't care about performance?
>
> --
> "If HP was only an $8 billion | Bob Glamm H: +1 612 6239437 W: +1 612 6268981
> company like Sun, we also | URL: http://www-mount.ee.umn.edu/~glamm
> might be less ambitious." +-----------------------------------------------
> -- HP's Lewis Platt referring to Sun's refusal to support Windows NT