Re: Kernel-level checkpointing

From: Ondrej Feela Filip (feela@ipex.cz)
Date: Tue May 16 2000 - 02:23:35 EST


On Sun, 14 May 2000, Martin Mares wrote:

> On 2000-05-14T13:54:21,
> Rayson Ho <ut_bookstore@yahoo.com> said:
>
> > I want to develop a user-level application for
> > fault-tolerance servers. Can someone tell me where I
> > can get information about the kernel-level
> > checkpointing (i.e., to write the image and state of a
> > process to disk so that another computer can re-run
> > that process)??
> >
> > APIs, kernel source, project URLs, etc would be very
> > useful.
>
> This hasn't been developed yet. All the solutions I have seen so far implement
> that in the application, which saves its state in regular intervals. I assume
> this will be the most efficient solution in any case.

Please look at ftp://ftp.gin.cz/pub/local/feela/src/freezer.tgz. It's
alpha but maybe it can help you.

>
> A kludge I am toying with in my mind would be to take entire system snapshots
> ("suspend to swap") and restart those on another machine when one fails.
>
> User Mode Linux may also be helpful here.
>
> If you are looking for a serious, not-as-crazy starting point though, you may
> want to look at MOSIX - for the process migration, they are facing similiar
> issues.
>
> Now, adding generic process check pointing combined with MOSIX, that would be
> an awesome HA HPC cluster framework...
>
> Sincerely,
> Lars Marowsky-Brée <lmb@suse.de>
> Development HA
>
>

-- 
Ondrej Feela Filip
E-mail: feela@ipex.cz
WWW: http://feela.network.cz
PGP: finger feela@atrey.karlin.mff.cuni.cz

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:10 EST