[PATCH v2 3/3] Add documentation and credits for BadRAM

From: Stefan Assmann
Date: Wed Jun 22 2011 - 07:20:04 EST

Add Documentation/BadRAM.txt for in-depth information and update

Signed-off-by: Stefan Assmann <sassmann@xxxxxxxxx>
Acked-by: Tony Luck <tony.luck@xxxxxxxxx>
Acked-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Documentation/BadRAM.txt | 370 +++++++++++++++++++++++++++++++++++
Documentation/kernel-parameters.txt | 6 +
3 files changed, 385 insertions(+), 0 deletions(-)
create mode 100644 Documentation/BadRAM.txt

diff --git a/CREDITS b/CREDITS
index dca6abc..d57d4af 100644
@@ -2899,6 +2899,15 @@ S: 6 Karen Drive
S: Malvern, Pennsylvania 19355

+N: Rick van Rein
+E: rick@xxxxxxxxxxx
+W: http://rick.vanrein.org/
+D: Memory, the BadRAM subsystem dealing with defective RAM modules.
+S: Haarlebrink 5
+S: 7544 WP Enschede
+S: The Netherlands
+P: 1024D/89754606 CD46 B5F2 E876 A5EE 9A85 1735 1411 A9C2 8975 4606
N: Stefan Reinauer
E: stepan@xxxxxxxx
W: http://www.freiburg.linux.de/~stepan/
diff --git a/Documentation/BadRAM.txt b/Documentation/BadRAM.txt
new file mode 100644
index 0000000..a98dcd2
--- /dev/null
+++ b/Documentation/BadRAM.txt
@@ -0,0 +1,370 @@
+The BadRAM feature enables Linux to run on broken memory. The
+resulting system will be stable and healthy, because the kernel
+simply never allocates the faulty pages for use. This is how
+to setup BadRAM if your memory is failing.
+As RAM memory grows smaller, it also becomes harder to manufacture
+chips that are perfect. Each single cell that is failing could cause
+an entire memory module to fail. Even though manufacturers put in
+extra cells to replace failed ones, it is still possible that the
+sensitive small structures get damaged by an electrical discharge on
+their pins. Such damage leads to problems in fixed locations of
+the address space of a memory module, which is what theory predicts
+and has been confirmed by years of experience with bad memory.
+It is not necessary for such a memory module to be discarded. All
+pages of memory behave the same, and if only we skip the failing
+pages we can continue to use the module for many more years. The
+operating system kernel simply has to avoid using the blocks that
+are damaged. This is easy to do in the part of the kernel where
+memory pages are allocated.
+Reasons for using BadRAM
+Chip manufacturing processes use lots of harsh chemicals, and the less
+of these used, the better. Being able to make good use of partially
+failed memory chips means that far less of those chemicals are needed
+to provide storage. This reduces expenses and it is lighter on the
+environment in which we live.
+This kernel feature clearly shows that Linux is "the flexible OS".
+If something does not work, fix it. Also, share it with all the
+others who could use it. After more than a decade of BadRAM,
+the response has been purely positive, because it has helped real
+people to solve real problems.
+One important use for this feature is with laptops that have their
+memory soldered in. Such laptops would have to be discarded as a
+whole, but with BadRAM in place they can continue to be used
+without further restrictions.
+Finally, running a system on broken memory is just plain cool ;-)
+Running example
+To run this project, I was given two DIMMs, 32 MB each. One, that we
+shall use as a running example in this text, contained 512 faulty bits,
+spread over 1/4 of the address range in a regular pattern. This looks
+a lot like the fauly pattern that many others have reported; the only
+common other pattern is a single faulty spot. With such memory, a few
+tricks with a thorough RAM tester and some binary calculations suffice
+to write these fault patterns down in 2 longword numbers. The format
+of these is hexadecimal, which is a condensed way of writing down the
+binary patterns that make the hardware patterns recognisable.
+After being patched and invoked with the properly formatted description,
+the kernel held back only the memory pages with faults, and never handed
+them out for allocation. The allocation routines could therefore
+progress as normally, without any adaptation. This is important, since
+all the work is done at booting time. After booting, the kernel does
+not have to do spend any time to implement BadRAM.
+As a result of this initial exercise, I gained 30 MB out of the 32 MB
+DIMM that would otherwise have been thrown away. Of course, these
+numbers scale up with larger memory modules, but the principle is
+the same.
+The structure of memory failures
+Memory chips are usually laid out in a roughly equal number of rows
+and columns, making it a square of cells that each store one bit.
+When addressing a bit, the processor sends the row and column in
+separate phases, and then reads or writes its value. The rows and
+columns are therefore visible on the outside of a chip.
+The connections of row and column lines to the outside world is
+usually protected by a buffer. It can happen that a static
+discharge damages such a buffer, causing an entire row or an
+entire column to fail. This means that a series of bits become
+unusable in a single page or in a regular pattern of pages,
+depending on whether it was a row or column that got damaged.
+For this reason, BadRAM was designed to describe memory faults
+in a pattern of address/mask pairs. An address locates an
+error and a zero on the corresponding position in the mask
+defines which bits in the address may be replaced with any
+other value. This has shown to work as a tight description
+of error patterns: it is very compact, but does not waste pages
+that are good.
+BadRAM's notation for memory faults
+Instead of manually providing all 512 errors in the running example
+to the kernel, it's easier to use a pattern notation. Since the
+regularity is based on address decoding software, which generally
+takes certain bits into account and ignores others, we shall
+provide a faulty address F, together with a bit mask M that
+specifies which bits must be equal to F. In C code, an address A
+is faulty if and only if
+ (F & M) == (A & M)
+or alternately (closer to a hardware implementation):
+ ~((F ^ A) & M)
+In the example 32 MB chip, I had the faulty addresses in 8MB-16MB:
+ xxx42f4 ....0100....
+ xxx62f4 ....0110....
+ xxxc2f4 ....1100....
+ xxxe2f4 ....1110....
+The second column represents the alternating hex digit in binary form.
+Apparently, the first and next to last binary digit can be anything,
+so the binary mask for that part is 0101. The mask for the part after
+this is 0xfff, and the part before should select anything in the range
+8MB-16MB, or 0x00800000-0x01000000; this is done with a bitmask
+0xff80xxxx. Combining these partial masks, we get:
+ F=0x008042f4 M=0xff805fff
+That covers every fault in this DIMM; for more complicated failing
+DIMMs, or for a combination of multiple failing DIMMs, it can be
+necessary to set up a number of such F/M pairs.
+Getting started
+If you experience RAM trouble, first read Documentation/memory.txt
+and try out the mem=4M trick to see if at least some initial parts
+of your RAM work well. Note that 4 MB will not be able to hold a
+modern desktop, so if you rely on that you would have to set the
+limit higher (and accept that your sanity check is not as tight as
+The BadRAM routines halt the kernel in panic if the reserved area
+of memory (containing kernel stuff) contains a faulty address. It
+will only do that when supplied with the patterns below; this
+initial check is merely to see if this is likely to happen.
+Running a memory checker
+There is no memory checker built into the kernel, to avoid delays
+at runtime or while booting. If you experience problems that may
+be caused by RAM, run a good outside RAM checker. The Memtest86
+checker is a popular, free, high-quality checker. Many Linux
+distributions include it as an alternate boot option, so you may
+simply find it in your boot loader's boot menu.
+The memory checker lists all addresses that have a fault. It will
+do this for a given configuration of the DIMMs in your motherboard;
+if you replace or move memory modules you may find other addresses.
+In the running example's 32 MB chip, with the DIMM in slot #0 on
+the motherboard, the errors were found in the 8MB-16MB range:
+ xxx42f4
+ xxx62f4
+ xxxc2f4
+ xxxe2f4
+The error reported was a "sticky 1 bit", a memory bit that always
+reads as "1" even if a "0" was just written to it. This is
+probably caused by a damaged buffer on one of the rows or columns
+in one of the memory chips.
+It would be a lot of work to collect the individual errors and
+condense them into a pattern. That is why I patched the
+Memtest86 (v2.3+) checker to directly print out the address/mask
+pairs that are used by this kernel feature. All you would do is
+select the BadRAM printout option at the start of the scan, and
+then leave it running for hours and hours, until it has made at
+least one pass. The patterns are printed each time a bit is
+added, but each line contains all faults found up to that point,
+so you would write down the last set of patterns printed, and
+supply that as a boot option in your next run of a
+BadRAM-capable Linux kernel.
+If you use this patch on an x86_64 architecture, your addresses are
+twice as long. Fill up with zeroes in the address and with f's in
+the mask. The latter example would thus become:
+ mem=24M badram=0x0000000000f00000,0xfffffffffff00000
+The patch applies the changes to both x86 and x86_64 code bases
+at the same time. Patching but not compiling maps the entire
+source tree at once, which makes more sense than splitting the
+patch into an x86 and x86_64 branch, because those two branches
+could not be applied at the same time because they would overlap.
+Rebooting Linux
+Once the fault patterns are known we simply restart Linux with
+these F/M pairs as a parameter. If your normal boot options look
+ root=/dev/sda1 ro
+you should now boot with options
+ root=/dev/sda1 ro badram=0x008042f4,0xff805fff
+or perhaps by mentioning more F/M pairs in an order F0,M0,F1,M1,...
+When you provide an odd number of arguments to badram, the default
+mask 0xffffffff (meaning that only one address is matched) is
+applied to the last address.
+If your bootloader is GRUB, you can supply this additional
+parameter interactively during boot. This way, you can try them
+before you edit /boot/grub/grub.conf to put them in forever.
+When the kernel now boots, it should not give any trouble with RAM.
+Mind you, this is under the assumption that the kernel and its data
+storage do not overlap an erroneous part. If they do, and the
+kernel does not choke on it right away, BadRAM itself will stop the
+system with a kernel panic. When the error is that low in memory,
+you will need additional bootloader magic, to load the kernel at an
+alternative address.
+Now look up your memory status with
+ cat /proc/meminfo |grep HardwareCorrupted
+which prints a single line with information like
+HardwareCorrupted: 2048 kB
+The entry HardwareCorrupted: 2048k represents the loss of 2MB
+of general purpose RAM due to the errors. Or, positively rephrased,
+instead of throwing out 32MB as useless, you only throw out 2MB.
+Note that 2048 kB equals 512 pages of 4kB. The size of a page is
+defined by the processor architecture.
+If the system is stable (which you can test by compiling a few
+kernels, and a few file finds in / or so) you can decide to add
+the boot parameter to /boot/grub/grub.conf, in addition to any
+other boot parameters that may already be there. For example,
+ kernel /boot/vmlinuz root=/dev/sda1 ro
+would become
+ kernel /boot/vmlinuz root=/dev/sda1 ro badram=0x008042f4,0xff805fff
+Depending on how helpful your Linux distribution is, you may
+have to add this feature again after upgrading your kernel. If
+your boot loader is GRUB, you can always do this manually if you
+rebooted before you remembered to make that adaptation.
+BadRAM classification
+This technique might start a lively market for "dead" RAM. It is
+important to realise that some RAMs are more dead than others. So,
+instead of just providing a RAM size, it is also important to know
+the BadRAM class, which is defined as follows:
+ A BadRAM class N means that at most 2^N bytes have a problem,
+ and that all problems with the RAMs are persistent: They
+ are predictable and always show up.
+The DIMM that serves as an example here was of class 9, since 512=2^9
+errors were found. Higher classes are worse, "correct" RAM is of class
+-1 (or even less, at your choice).
+Class N also means that the bitmask for your chip (if there's just one,
+that is) counts N bits "0" and it means that (if no faults fall in the
+same page) an amount of 2^N*PAGESIZE memory is lost, in the example on
+an x86 architecture that would be 2^9*4k=2MB, which accounts for the
+initial claim of 30MB RAM gained with this DIMM.
+Note that this scheme has deliberately been defined to be independent
+of memory technology and of computer architecture.
+Further Possibilities
+**Slab allocation support**
+It would be possible to use even more of the faulty RAMs by employing
+them for slabs. The smaller allocation granularity of slabs makes it
+possible to throw out just, say, 32 bytes surrounding an error. This
+would mean that the example DIMM only caused a loss of 16kB instead
+of 2MB, or scaled-up similar values for larger memory sizes. One
+specific area that could benefit from this is the growing market
+for embedded devices, which usually wants to meet tight budgets.
+It should be possible to make the slab allocator prefer pages with
+broken memory, and allocate the faulty places in memory before the
+other slabs are made available to the kernel. In the best possible
+situation, this could reduce the loss of good RAM cells to zero!
+**Support for low-memory errors**
+To the best of my knowledge, boot loaders like GRUB cannot load
+the Linux kernel in non-standard locations. This means that any
+errors at low memory locations cannot be overcome with BadRAM.
+Anything that physically alters the memory layout can be used
+to overcome such problems; this may be achieved through BIOS
+settings, or by adding or swapping memory modules.
+A general solution could be to use a boot loader that can load
+the Linux kernel (and its initial memory allocation) at other
+memory addresses than are standard.
+**Boot-time memory checking**
+Many suggestions have been made to insert a RAM checker at boot time;
+since this would leave the time to do only very meager checking, it
+is not a reasonable option; we already have a half-done BIOS check
+doing that!
+**ECC RAM integration**
+It would be interesting to integrate this functionality with the
+self-verifying nature of ECC RAM. These memories can even distinguish
+between recoverable and unrecoverable errors! Such memory has been
+handled in older operating systems by `testing' once-failed memory
+blocks for a while, by placing only (reloadable) program code in it.
+I possess no faulty ECC modules to work this out, and there is no
+general use for it either.
+Names and Places
+The home page of this project is on
+ http://rick.vanrein.org/linux/badram
+This page also links to Nico Schmoigl's experimental extensions to
+this patch (with debugging and a few other fancy things).
+In case you have experiences with the BadRAM software which differ from
+the test reportings on that site, I hope you will mail me with that
+new information.
+The BadRAM project is an idea and implementation by
+ Rick van Rein
+ Haarlebrink 5
+ 7544 WP Enschede
+ The Netherlands
+ rick@xxxxxxxxxxx
+If you like it, a postcard would be much appreciated ;-)
+ Enjoy,
+ -Rick.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index cc85a92..a12b27a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -51,6 +51,7 @@ parameter is applicable:
FB The frame buffer device is enabled.
GCOV GCOV profiling is enabled.
HW Appropriate hardware is enabled.
+ HWPOISON HWPOISON support is enabled.
IA-64 IA-64 architecture is enabled.
IMA Integrity measurement architecture is enabled.
IOSCHED More than one I/O scheduler is enabled.
@@ -373,6 +374,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.

autotest [IA64]

+ badram= [HWPOISON]
+ Allows memory areas to be flagged as HWPOISON.
+ Format: <addr>,<mask>[,...]
+ See Documentation/BadRAM.txt
baycom_epp= [HW,AX25]
Format: <io>,<mode>


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/