On Sun, 14 May 2000, Martin Mares wrote:
> On 2000-05-14T13:54:21,
> Rayson Ho <ut_bookstore@yahoo.com> said:
>
> > I want to develop a user-level application for
> > fault-tolerance servers. Can someone tell me where I
> > can get information about the kernel-level
> > checkpointing (i.e., to write the image and state of a
> > process to disk so that another computer can re-run
> > that process)??
> >
> > APIs, kernel source, project URLs, etc would be very
> > useful.
>
> This hasn't been developed yet. All the solutions I have seen so far implement
> that in the application, which saves its state in regular intervals. I assume
> this will be the most efficient solution in any case.
Please look at ftp://ftp.gin.cz/pub/local/feela/src/freezer.tgz. It's
alpha but maybe it can help you.
>
> A kludge I am toying with in my mind would be to take entire system snapshots
> ("suspend to swap") and restart those on another machine when one fails.
>
> User Mode Linux may also be helpful here.
>
> If you are looking for a serious, not-as-crazy starting point though, you may
> want to look at MOSIX - for the process migration, they are facing similiar
> issues.
>
> Now, adding generic process check pointing combined with MOSIX, that would be
> an awesome HA HPC cluster framework...
>
> Sincerely,
> Lars Marowsky-Brée <lmb@suse.de>
> Development HA
>
>
-- Ondrej Feela Filip E-mail: feela@ipex.cz WWW: http://feela.network.cz PGP: finger feela@atrey.karlin.mff.cuni.cz- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:10 EST