Re: MOSIX and kernel mods.

Richard Solis (mail109988@pop.net)
Thu, 04 Mar 1999 22:29:20 -0500


Richard,

I think you are missing the point.

The point is that there exists a class of problems that are very well suited
to the use of a distributed model.

Think of it this way: you can ask 12 people to come help you move but not to
help you read a book.

The reasons are obvious: one is essentially a parallel activity where most of
the work can proceed independantly of what else is going on. The other is
essentially a serial activity where you get better results just doing it
yourself.

For those applications where a distributed model can be used to scale an
application up, facilities to help provide that model are useful.

Richard Gooch wrote:

> Kamran Karimi writes:
> >
> > On Fri, 5 Mar 1999, Richard Gooch wrote:
> > > Distributed shared memory is a fundamentally flawed idea. It pretends
> > > to be SMP when it really isn't. With SMP I can have two CPUs writing
> > > to the same page (different cache lines) at the same time and it's
> > > fast.
> >
> > Two CPU's can't write to the same location of the cache at the same
> > time, and in DSM two processes can't write to the same page at the
> > same time. Same inherent limitation.
>
> No, writing to different cache lines (like I said) can happen at the
> same time with a write-back cache (PPro and above). When the CPUs get
> around to writing the cache lines to RAM, one of them will experience
> a delay of a couple of hundred nanoseconds.
>
> Writing to the same page with DSM requires the page to be moved
> around, which costs a couple of milliseconds (on a fast network).
>
> So we have 4 orders of magnitude difference in cost. A cost difference
> of a few tens of percent and you could still say the two cases are
> essentially the same kind of operation. With 4 orders of magnitude,
> you have something that is very different, and should not be
> considered to be even remotely the same.
>
> > > With DSM, you've got MMU tricks to only allow one machine at a time to
> > > write. If one CPU writes and then the other reads, you've got shift
> > > the page across the network.
> >
> > First time I hear being slower is a sign of being fundamentally
> > flawed.
>
> The flaw isn't that shifting data around a network is slow. The flaw
> is that you provide a programming model that attempts to emulate a
> class of hardware where that emulation will run orders of magnitude
> slower, and the danger that programmers will ignore the difference
> between real SHM and DSM.
>
> > > With SMP, a lock is cheap. With DSM, a lock is expensive.
> >
> > I guess most people know this. I would use a semaphore for
> > synchronization in a DSM application.
>
> A semaphore is a lock. So is a mutex. The point is that locks (of any
> type) are expensive on a distributed machine, so their use should be
> discouraged. Strongly. In fact they should not exist.
>
> Locks are only meant to be used when they're cheap. When they're
> expensive, you should code the application in a different way to avoid
> locking.

Locks are means to be used to serialize access to data. If your code is
locking everything in sight it will NEVER run well in a distributed
environment where the whole point is to avoid working on the same data.

>
> > > > Interesting. This is like saying it is wrong to let the OS do
> > > > task-switching and create the illusion that more than one task are
> > > > running, while in reallity there is only one CPU. Why not make the
> > > > programmers deal with such issues themselves? They have to make sure
> > > > their programs stop at the specified time intervals so that other
> > > > apps get a chance to run.
> > >
> > > Your analogy is broken. Multitasking has hardware support and is a
> > > cheap operation. DSM has no hardware support and is an expensive
> > > operation.
> >
> > Can be done with no hardware support. Look at the Amiga OS that ran
> > on a little MC68000. They put support for it in the hardware because
> > most OS designers needed it. That's it.
>
> No, you *need* hardware support for multitasking. The timer interrupt
> is essential. Without that the performance would be hideous.
>
> Hardware support for DSM would be a very high speed network (GB/s)
> with low latencies (microseconds) and scatter/gather DMA into memory.
>
> > > Even on machines which do have hardware support (SGIs Origin series),
> > > it's my understanding that getting page migration to work properly has
> > > been a problem.
> >
> > I don't know about that case. It works properly under DIPC.
>
> You've missed the point. The point is that SGI tried to do page
> migration (classic DSM stuff), and they added the hardware to support
> that (those fancy Cray links). And yet they had trouble getting decent
> performance.
>
> And you're trying to get decent performance with DSM on a bog-standard
> cluster of networked computers? Good luck.
>
> > > > And why stop here? Why not let them handle their own memory
> > > > managment? After all, they know more about their memory needs than
> > > > the OS! Let them use overlaying to save parts of their memory space
> > > > they _know_ will not be used in near future out to a hard disk. This
> > > > will improve performance because the OS is too dumb to make a good
> > > > decision in many cases. The possibilities are limitless here.
> > >
> > > Yeah, right. Lets rewrite the kernel in Java: it provides such nice
> > > abstractions.
> > >
> > > Please.
> >
> > I thought you would like this approach!
>
> By using an OS and APIs that make life easier, were giving up a small
> amount of performance (< 10% in general). That's worth it. But a
> scheme with gives up 4 orders of magnitude of performance is not worth
> it.
>
> > > > The trend in computer science has been to add more and more
> > > > abstraction layers, so as to reduce the complexity of the
> > > > application development and maintanance times.
> > >
> > > My point is that DIPC/DSM is the *wrong* abstraction. Same goes for
> > > MOSIX.
> >
> > IMO, You're reasons are not very valid.
>
> I know that you can write some DSM-friendly applications that will
> work well with DSM. But the problem is that you can only do that by
> understanding how DSM works. You have the advantage of having
> implemented it.
>
> But someone who takes their standard threaded code for a SMP machine
> and uses DSM instead is likely to have it run slower than on a single
> machine.
>
> So you've provided a seductive programming environment with a huge
> trap for the unwary. That's what I disagree with.
>
> > > > Performance seems to have been a secondary issue.
> > >
> > > Sure, in *computer science*. CS labs around the world have been
> > > producing new operating systems for decades, each one is more abstract
> > > than the ones before. Now tell me how many of them are actually used
> > > in the real world?
> >
> > Nearly all of the things you are using now were first developed in
> > CS labs. Some things would catch on and some won't. In most cases
> > non-technical reasons are involved in a concept's wide-spread usage.
>
> I can't think of any CS invention that is used in the real world that
> has cost 4 orders of magnitude in performance.
>
> > > You don't need DSM to provide a nice and easy programming environment.
> >
> > DSM may not be the only way, but it sure is nice and easy to use.
>
> What is part of why I think it's a bad idea. If it was nice and easy
> to use and was also efficient, I'd be all for it.
>
> > > DIPC (as in: strict message-passing) is one thing. Distributed shared
> > > memory is another. Two nodes banging on the same page just isn't going
> > > to run well with DSM.
> >
> > Agreed. The programmer who needs to avoid such pitfalls has to be
> > careful. The good thing is that DIPC will work even if the
> > application behaves in such ways. The programmer can always go back
> > and improve on his already-working app.
>
> But you're still missing the point. Banging on the same page and
> grabbing locks quickly is perfectly legitimate on an SMP machine.
> There's nothing wrong with coding that way.
>

Except for my previous comment. An program written so all the processors are
fighting to access the same data would run better on a uniprocessor machine.

>
> If you provide a feature that allows people to use a perfectly valid
> technique but requires they be aware of new pitfalls, then you need to
> seriously question whether it is sensible to provide that feature.
>
> > > No, go back and read what I said. DSM is the *wrong* abstraction. DIPC
> > > without the DSM it is OK (as long as the kernel isn't bloated with it:
> > > do it in the library).
> >
> > That is your opinion, and believe me, you'll have a hard time
> > convincing me of that.
>
> And vice versa :-)
>
> But ask yourself who you are writing this for. If it's for yourself
> and others who know the pitfalls, then fine (but don't bloat the
> kernel just for that). If it's for your average thread programmer who
> wants to run their efficient parallel codes on a cluster, then I think
> you're creating problems which weren't there before. Those people
> would be better served with an abstraction that doesn't pretend to be
> something it really isn't and doesn't seduce people into the tarpit.
>
> Regards,
>
> Richard....
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/