RE: MOSIX and kernel mods.

Kamran Karimi (karimi@cs.uregina.ca)
Sat, 6 Mar 1999 15:10:26 -0600 (CST)


On Fri, 5 Mar 1999, Chris McClellen wrote:

> Why code a parallel application to be slow? If you are diving into
> parallelism of any kind, the assumption is that you are looking for some
> speed. For instance, you have decided that you have some tasks that
> can take place at the same time; ie a calculation that can be distributed.
> Why put all the effort into parallelization, which implicitly means
> looking at classical critical section problems, deadlocks, etc for a
> nice big slowdown? "Parallel for free" isn't good if it runs worse than
> a non-parallel program. It would also stink to have someone have to start
> over because the way the used an abstraction is horrible, but seemed
> sensible at the time.

You are confising distinct concepts. There is a difference between
"distributed" and "parallel", even though for you and many other people
the _only_ reason for gooing distributed is to make an application
execute in parallel. This should not always be the case.

> To get a decent performing application, you have already as much admitted
> that the programmer be aware of the pitfalls of DIPC. So then why
> is this abstraction truly easier than say user-level message passing?
>
> Is it so that you can say something like:
>
> lock();
> memcpy(sharedp, ...);
> unlock()?
>
> Instead of :
>
> someapi_sendmessage(pgroup, sharedp, size)?

As Richard Gooch likes to say, you are missing the point. This may apply
to most other DSM systems, But in DIPC using the explicit lock is not
required. The programmer can optionally use it if necessary.

> Yeah, in one implementation, it looks like any other pointer, and true
> perhaps there is really a shared pool. But gosh it seems like it would be
> slow to keep banging on sharedp in the first case. Caching doesn't seem
> like a great idea of sharedp is updated frequently.
>
> In another, the sharedp may actually live in each address space, and perhaps
> the programs updated contents of local copy of sharedp upon receipt of
> message. Again, frequent changes could cause a message storm.

Exactly. DSM just hides the underlying message passing. They both suffer
from the same limitations.

> However, it seems to me that if you are writing a parallel program that
> is beyond SMP (ie, parallel across a cluster of heterogenous computers),
> there are a lot of issues to deal with and if you are capable of
> breaking the problem up correctly, you are equally capable of using
> some other API that gives the same result without doing what amounts
> to operator overloading.
>
> On the other hand, if message passing APIs, etc, are too complicated
> for people, I am going to go out on a limb here: they probably shouldn't
> be writing distributed applications. If they understand the issues
> the apis are not *that* complicated.

I certainly disagree. This is like claiming that because assembly
language better represents the way a computer works, people who can't
learn it should better not write any programs.

> As I said, the only "free" here may be a slow program. Programs that
> aren't distributed to begin with may not need to use IPC at all, least
> of all DIPC.

All a performance-loving DIPC programmer should be aware of is that
writing to close-by memory regions can slow the program (DIPC replicates
the pages for readers, so they can proceed in parallel). DIPC's
documentation suggests some ways to go around this.

However, I stress again that unlike most other DSM system, DIPC uses
strict consistency. The programmer _can_ write his code with no
synchronization, if that suits his purposes.

> Unfortunatly, you have to know what you are dealing with. Local memory
> is faster than memory across the network. Fear the people who feel they
> can't deal with that difference. If you want performance, you want
> to know whats expensive and whats not. People have to design stuff all
> the time that works efficiently over slow networks vs fast networks.
> They need to know what the target is because software architecture will
> differ depending on what you are dealing with (ie you may need caching
> in one instance, but not in the other to get reasonable performance).
> This is a reality of life. In the future, when all networks have
> uniform bandwidth, then this will be irrelevant :-)

DIPC makes the programming task easier by allowing the programmer to not
worry about invoking the right send/receive calls at the right places in
his code. I think this is a great help. Programmers who want more
performance should use some thinking about the proper data exchange.

> The "easy way" could turn out to be more expensive when the people who
> don't understand what they are doing start using DSM, and a company
> has to pay for the software twice: once for the person doing DSM in
> a "sensible" way, and once for the rewrite in which someone else comes
> in and uses something else, or fixes the program so that the program KNOWS
> about HOW DSM operates. Having to know HOW DSM operates (and thusly you
> know the pitfalls) kills this notion that it makes things easier.

I consider the two stage process of getting the program to work in a
short time and then refining it much better than a pure message passing
implementation.

-Kamran

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/