Re: Core Dumps & Restarting

Jason McMullan (jmcc@yop.vi.ri.cmu.edu)
1 Nov 1996 03:55:27 GMT


Timothy Peters (tim.peters@nene.ac.uk) wrote:
: I am currently investergating the posability of a research project in
: this area (Resistant Storage in Operating System Recovery).

: I will be using linux as a base for my work which will involve try to
: get a system to halt and restart as if nothing had happend, and also the
: possability of creating two indpendant machines that act as one so that
: if one crashes the other can pick up and continue with out the
: applications noticing.

: If any body has any idears of suggestions or knows of any other work
: that is being done in this field please let me know.

First off - don't use Intel Linux - use Sparc Linux (or even
Atari Linux) instead. Intel machines are too hardware happy
(read 'unknown hardware states') for that to work.

But some issues you will have to address:
* SIGALARMs that are late
(and a whole slew of other timing problems)
* Cache flushing issues
(what was MMAPed, what was in swap, etc)
* Dropped network connections
(an almost impossible problem w/out _very_ special
hardware to keep alive TCP/IP connections)
* Unknown hardware states
(I lost power in the middle of getting a packet
from the SCSI HD - do I return an error on
bootup, or retry? - repeat for _all_ devices)

I wish you luck... This is a PhD thesis, right?
(you'll need the years... or a lot of grad students)

Jason McMullan - Research Programmer, Robotics Institute, CMU

Me: http://www.ul.cs.cmu.edu/~jmcc
Linux GGI: http://synergy.caltech.edu/~ggi