Linux RAS

From: Keith Owens (kaos@ocs.com.au)
Date: Fri Sep 15 2000 - 07:07:40 EST


On Fri, 15 Sep 2000 10:42:54 +0100,
richardj_moore@uk.ibm.com wrote:
>I think the case for the kernel debugger is better stated as the case for
>RAS (Reliability, Serviceability and Availability) in the kernel [snip]
>Customers knew we never sent
>developers on site to debug OS/390 (or MVS as it was called in those days).
>They also knew that we never rejected a problem because it was not
>re-creatable and we never even asked for a re-creation scenario. The reason
>for this was that we had appropriate RAS capability in MVS which allowed
>data to be captured automatically at fault time combined with a certain
>amount of self-healing capability and automated recovery.

It would be nice if Linux had the same level of RAS as MVS. But
consider the different environments that MVS and Linux run under.

MVS. Standard and external I/O controllers. Fix the data pages,
        build the CCW, issue SIO, forget about it until it interrupts.
        I/O requires little or no OS support, it can even be done from
        bare hardware.

Linux. 50+ different I/O controllers. Most require continual hand
        holding by the operating system. Doing I/O from bare hardware
        is difficult if not impossible.

MVS. Multiple systems keys (0-7) so different system components can
        be segregated. If VTAM (key 7) dies the rest of the OS (keys
        0-6) can run. Multiple system keys stop most runaway
        components from stamping on other component's data. Key
        switching is relatively cheap.

Linux. On ix86, all of the kernel runs in key 0. No protection of one
        part of the kernel from another. Key switching is relatively
        expensive.

MVS. High quality hardware. Processors that can detect their own
        problems, all memory is ECC or better, disks that report
        failures before they occur.

Linux. Most users are on the cheapest possible hardware. No parity on
        memory, processors and disks that just die without warning.
        Lots of random problems, "one bit different, probably bad RAM".

MVS. Trace of interesting events in the master trace table.

Linux. No kernel trace table (oops backtrace does *not* count).

MVS. Only one hardware model - S/390 with minimum hardware
        requirements for each version of OS/390. IBM can get away with
        telling customers "to run OS/390 2.9 you need a G5 processor or
        better".

Linux. 12+ different hardware models, with no restriction on how old
        the hardware is. We cannot tell customers "to run Linux 2.4
        you must have a Pentium III or better".

MVS. If all else fails, MVS can do a stand alone IPL to capture the
        crash data. It requires that a site plan ahead by building
        stand alone IPL text and dedicating an area of disk to receive
        the crash dump but apart from that it always works.

Linux. Assuming that your I/O subsystem will run without a working OS
        (and lkcd shows how difficult that is), you still need a
        mechanism to get the stand alone dump going. On our most common
        platform (ix86) we have to pray that the BIOS will not wipe
        memory before saving the crash dump. If the BIOS is not
        reliable (and most are not) then the stand alone code must be
        built into the kernel and protected somehow. Then you have the
        problem of activating the stand alone dump.

MVS. Decent hardware support for OS tracing, with Program Event
        Registers (PER) that are standardized and easy to use.

Linux. Each platform has different hardware mechanisms for hardware
        debugging. Some have no hardware debugging at all.

MVS. The OS developers know everything about the hardware. They
        even know the undocumented processor and I/O subsystem
        commands.

Linux. Nobody knows everything about every bit of hardware that is
        supported.

MVS. Decent hardware support for cheap cross memory services. OS
        data can be separated into multiple data spaces, separating
        unrelated data.

Linux. Where is data for subsystem XYZ? Somewhere in the kernel,
        along with every other subsystem's data.

MVS. Much of the I/O is controlled by user space applications. MVS
        is just a low level provider of I/O services and relies on
        complicated code in user applications to do data sharing. This
        is changing with Unix System Services and HFS data spaces but I
        suspect that the bulk of data movement is still being done by
        batch, IMS, CICS and VSAM data spaces, all controlled from user
        land.

Linux. All I/O is done by the kernel so the kernel contains all the
        complicated code to maintain data integrity.

MVS. IBM controls the levels of software in a system. SMP (System
        Management/Mangling Product) may have its faults but as a tool
        for building and tracking complicated binaries it makes RPM,
        DEB etc. look pitiful. IBM tests new versions of OS/390
        before shipping and it knows that its customers will not/cannot
        change the running OS.

Linux. Roll your own kernel. Select your own options, and add those
        15 patches you found from 6 different ftp sites. Do it all
        without any tools for version or reliability checks. Now try
        to report a bug based on a binary dump, when you have lost the
        System.map and config options for this kernel.

Don't get me wrong, I would love to have the same level of RAS and
debugging tools in Linux that OS/390 has. But getting Linux up to the
level of OS/390 RAS is at least an order of magnitude harder because of
the hardware and software constraints that Linux has to run under.

Right now Linux is about at the stage of MVS/SP 1.3, before cross
memory services were introduced. Back in 198x, most MVS data was in
CSA and crashes caused by storage corruption were a lot more frequent.
They were also a lot harder to diagnose, many is the time that IBM
asked for a problem to be reproduced, "try this slip trap, try that
slip trap, try this patch, sorry we cannot reproduce it". I won't
mention how bad IPCS version 1 was at reading crash dumps, nor how
difficult it is to add your own control block definitions even in the
latest IPCS.

Some of the problems above can be fixed in Linux.

* Add a master trace table.
* Move some data into different keys, if the hardware lets us.
* Get a crash dump facility, even if it is only supported on some
  hardware.
* Create decent tools for debugging crash dumps, anything would be
  better than reading a crash dump in IPCS.
* Standardize on tracking the System.map and .config with the kernel.
* Use the hardware trace and debug facilities as much as possible.
  Dprobes is a nice start.

The rest of the problems are inherent in the hardware that the two OS's
run on or are caused by the fundamentally different software models.

Keith Owens.
  Mainframe Sysprog since the days of VS/1, first machine was IBM 1620.
  Linux kdb, modutils, ksymoops maintainer, kbuild hacker.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 21:00:25 EST