That is all very nice. However, before doing it so sophisticated I
would like to suggest to first make it handle errors more reasonably...
(i.e. don't panic the system or hang processes when a block can't be read)
> It would be nice if you could get the messages printed on a fixed device
> directly by the kernel, so that you could send them to the first console
> (instead of the current one), to a terminal on a serial port, to a
> printer, etc. That would make them less dependent on complex stuff like
> syslogd and X.
>
> This doesn't require kernel changes; you just need to make source
> changes to klogd. This would work in most cases, where process
> scheduling is still working. It certainly works in the disk i/o error
> case mentioned above; you just have to instruct klogd to write kernel
> log messages to a device, instead of or in addition to forwarding the
> kernel message to syslogd.
I'm not sure what klogd is, my system currently runs only syslogd.
Is there an advantage to getting and running klogd? (I don't have it)
Note that when the disk is dead, processes that are not very active and
suddenly need to do something stand a high chance of failure.
(because they have been swapped out)
I know it is against the "do it in user mode" religion, but I really
think that critical error message printing should be done in the kernel,
not in a user process that probably dies with the system.
> In the case where I'm doing kernel work, and where I'm afraid a device
> driver bug that I'm working on might cause the system to hang at the
> interrupt level, i generally avoid working using X11 at all; I'll then
> kill off klogd, or run klogd -c 8, so that all kernel messages go to the
> current console, where I'm guaranteed to get the message even if the
> system is hung.
That only helps when you are doing debugging work.
(even then I would not like to be without X)
Problems with the disks or the SCSI bus can occur at any time, usually
when you don't expect or like it. I feel uneasy with the fact that
I don't see the error messages, and the filesystems are corrupted
beyond the minimum possible extent.
(e.g. an error on /dev/sda1 will corrupt a filesystem on /dev/sda2, only
because you can't sync the system properly before rebooting it)
Sure all kinds of nice solutions are possible, but why can't we report
device errors back to the user program, as all other operating systems
seem to be able to do? (and leave the system in a state where it can
be safely shutdown)
Rob
-- +------------------------------------+--------------------------------------+ | Rob Janssen rob@knoware.nl | AMPRnet: rob@pe1chl.ampr.org | | e-mail: pe1chl@wab-tis.rabobank.nl | AX.25 BBS: PE1CHL@PI8WNO.#UTR.NLD.EU | +------------------------------------+--------------------------------------+