Linux kernel and disaster recovery.

Theodore Y. Ts'o (TIGRANA@DSTIUK.CCMAIL.CompuServe.COM)
18 Jun 97 04:49:37 EDT


Dear Linux Kernel Community,

Imagine a set of 127 PCs running one of the windoz flavors of OS telnetting to a
server running Linux. In each session they run a program that creates an extra
master/slave pty pair and fork/execing the child /bin/login on the slave while
connecting the master to a FIFO or a UNIX domain socket. From another session
one could easily attach (and snoop, yes the idea is exactly the same as
ttysnoop(8), only implementation is slightly different) to the first session.

Now if the PC loses connection to the Linux server its fine - one can reattach
to the session and resume it.

Now, from this point on please be gentle because what I will say is only a vague
idea.....

What if the Linux server itself crashes? If it is under UPS and there was some
clever kernel module that would be able to somehow save the state of all (or
specific) running processes and write to a separate disk partition and then
after reboot to be able to restore the "memory dump" from the partition into
memory thus revitalising all those running processes that would be very nice. Of
course, I understand that the network sockets will be lost but it is fine
because with the scheme described above one simply reattaches to the sessions
using the UNIX domain socket and resumes it.

Any ideas as to whether it can be implemented at all? I would be interested in
both global solutions i.e. the whole kernel, perhaps even with all the
filesystems and local i.e.only specified processes are saved and restored,
together with the kernel objects that they have allocated (IPC, descriptors,
ttys etc).

Thank you in advance,
Tigran A. Aivazian.
------------------------

Technical Consultant,
DST International.