Re: Process migration and load balancing in clusters

Keith Rohrer (kwrohrer@uiuc.edu)
Thu, 14 Nov 1996 16:43:01 -0600


Nathan Bryant wrote:
>
> I've always been fascinated with distributed systems (such as Sprite) that
> can handle transparent process migration and load balancing in a cluster
> of workstations. The kernel modifications for doing this on Linux are not
> exactly trivial, of course; it would probably require changing a lot of
> internal interfaces. But I've been toying with the idea of experimenting
> with this, either on Linux or some other system, perhaps Mach-based.
> Anybody given any thought to this sort of thing?
Actually, the problems of process migration in a homogeneous network are
very similar to the "transparent checkpoint and resume" issues which
were discussed a few days ago. Migrating from CPU to CPU in an SMP
environment shouldn't be hard at all, ditto for suspending and resuming
across a time period not long enough to cause timeouts and not crossing
a reboot. Migratable programs under Linux could be done--perhaps even
at the user level--with the following restrictions:

* Open files, and the state thereof, must be preserved during the
suspend/migration; for migration the entire filesystem would probably
have to be or appear at least largely identical.
* Network connections are even harder to migrate, as they must be
either bounced off of some form of masquerading at their original host
and/or need to run some sort of protocol to tear them down and recreate
them later after the migration/resume completes.
* Processes using resources peculiar to the local machine (e.g. doom or
anything else running with MITSHM; svgalib apps; dosemu; X at all if
migrating across a slow link) shouldn't migrate.
* Many things won't be suitable for migration, then, in a distributed
semi-homogeneous UN*X environment, and many semi-portable apps will need
to be migratable only at certain points.
* The way Linux handles shared libraries, restoration of clean code
pages from disk, etc., leads one to believe that migrating or
suspend/resuming a process would only entail migrating/saving the dirty
pages, pcb/vm data, and resource states; if migration merits being done,
it can be done quickly.

Keith

-- 
The priests and the friars/Behold me in dread
Because I still love you,/My love, and you're dead.
    ---Dead Can Dance, "I Am Stretched On Your Grave", based on King/S.
	O'Connor's rewrite of "The Unquiet Grave", trad. Irish folk.