Re: New reliability technique

From: Avi Kivity
Date: Sat Feb 25 2006 - 14:49:57 EST


Jesper Juhl wrote:

On 2/25/06, Jesper Juhl <jesper.juhl@xxxxxxxxx> wrote:


On 2/25/06, Victor Porton <porton@xxxxxxxxxxx> wrote:


A minute ago I invented a new reliability enhancing technique.

In idle cycles (or periodically in expense of some performance) Linux can
calculate MD5 or CRC sums of _unused_ (free) memory areas and compare these
sums with previously calculated sums.

Additionally it can be done for allocated memory, if it will be write
protected before the first actual write. Moreover, all memory may be made
write-protected if it is not written e.g. more than a second. (When it
is written kernel would unlock it and allow to write, by a techniqie like
to how swap works.) If write-protected memory appears to be modified by
a check sum, this likewise indicates a bug.

If a sum is inequal, it would notice a bug in kernel or in hardware.

I suggest to add "Check free memory control sums" in config.



Implement it then and send a patch.




But, doesn't slab poisoning and the like already cover this ground somewhat?



No, they don't. They cover only a very small percentage of memory.

On the other hand, ECC memory and caches do this in hardware.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/