Floppies are too slow and they're not suitable for unattended dumps. You
don't want your ISP's PPP server to sit there for hours waiting for
somebody to swap floppies.
> 2) Swap partition.
That's reasonable, especially if you don't have lots of spare disk space.
(You can gzip the crash dump when copying it to a regular file.) Of course
you'll lose some state information, so this shouldn't be the only choice.
> 3) Crash partition. Do a crash dump to a special partition. For the
> ultra paranoid, this can be on a separate drive.
Or a crash file. This can be made as reliable as you want to.
> I don't actually see the crash dump as being a terribly difficult thing
> to write; might be a good first kernel hack project for someone. :)
It's a bit more involved, because you shouldn't use many parts of the
existing kernel. I still believe that using the BIOS is the best solution,
because:
- you don't depend on kernel data structures to be correct
- you don't change kernel data structures (e.g. writing to a dump file or
partition may use timers, allocate kernel memory, wait for locks, etc.)
- you don't depend on the disk driver to work (highly desirable if it's
that driver that's causing problems)
Naturally, there are a few disadvantages:
- you have to be able to figure out where the data should go, which means
a) to map partition/file offsets to disk positions, and b) to translate
them to addresses understood by the BIOS
- you have all the usual BIOS restrictions (1024 cylinders, only 2
drives, etc.)
- you have to be able to go back cleanly to the BIOS
I think that the mapping and detection of BIOS restrictions should be
handled by LILO, because that's what it already does. So the kernel part
would have to incorporate a few 100 bytes of LILO code, plus it would
have to read the map file, plus it would have to know how to fall back to
a system state where the BIOS can work.
The relevant code for LILO already exists, you can find it in
lrcftp.epfl.ch:/pub/people/almesber/lilo/lilo.pre17.tar.gz
There I tried to do dumps after the reboot. Unfortunately, it turned out
that most common BIOSes clear or overwrite the memory, so that there's
nothing left to dump. So what would have to be done is to take that code
and put it into the kernel. Then add the fallback to the BIOS and give
LILO some means to pass the start sector of the dump file's map section
to the kernel it's booting. That's it.
As a future extension, one could also add the ability to register dump
files on the fly. (It is more useful if you can use the dump file right
after the kernel is booted, because you may crash before you get to a
shell.)
About security: I'm accessing each sector three times:
- first, I read it and check whether it contains the right sequence
number and a known pattern that fills the rest of the sector (this
reduces the probability of destroying data due to a bad mapping to
about 2^-4096)
- second, I read it again (and check it), because the first read may
have corrupted some system variables. Not very likely, but that's
what paranoia is for
- third, I write the dump data
All this is very slow, but speed can be improved by:
- reading and writing larger blocks of data, e.g. by making the I/O
area size configurable
- doing the second read only a few times, e.g. only for the first two
sectors
I doubt I'd have time to work on this in the near future (well, maybe
over Xmas ... :), so if somebody else is interested in picking it up,
please feel encouraged to do so.
- Werner
-- _________________________________________________________________________ / Werner Almesberger, DI-LRC,EPFL,CH werner.almesberger@lrc.di.epfl.ch / /_IN_R_311__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/