That's checkpoint-restart you want, not clustered process migration.
Note that the process migration that's been talked about here redirects
stateful system calls (file I/O, getpid(), etc.) to the originating
processor, and that won't survive a reboot of the originating processor
very well.
Condor already does checkpoint-restart within certain constraints.
It's very hard to do checkpoint-restart in the completely general case;
how do you maintain existing network connections when you transfer a
process to another CPU or reboot your system. There's a lot of state
to save...
>Most of this could be accomplished in user space, I'd think.
Arbitrary checkpoint-restart needs kernel help. Limited checkpoint-restart
already exists in user space.
>> As has been repeatedly proven, moving an already started process is a lose
>> almost 100% of the time.
>
>Right. Unless somebody just typed shutdown -hnow at the #-rompt.
Not even then.
michaelkjohnson
"Magazines all too frequently lead to books and should be regarded by the
prudent as the heavy petting of literature." -- Fran Lebowitz