Re: New reliability technique

From: Victor Porton
Date: Sun Feb 26 2006 - 06:34:03 EST



On 25-Feb-2006 Avi Kivity wrote:
> Jesper Juhl wrote:
...
>>>On 2/25/06, Victor Porton <porton@xxxxxxxxxxx> wrote:
>>>
>>>
>>>>In idle cycles (or periodically in expense of some performance) Linux can
>>>>calculate MD5 or CRC sums of _unused_ (free) memory areas and compare these
>>>>sums with previously calculated sums.
>>>>
>>>>Additionally it can be done for allocated memory, if it will be write
>>>>protected before the first actual write. Moreover, all memory may be made
>>>>write-protected if it is not written e.g. more than a second. (When it
>>>>is written kernel would unlock it and allow to write, by a techniqie like
>>>>to how swap works.) If write-protected memory appears to be modified by
>>>>a check sum, this likewise indicates a bug.
>>>>
>>>>If a sum is inequal, it would notice a bug in kernel or in hardware.
>>>>
>>>>I suggest to add "Check free memory control sums" in config.

> No, they don't. They cover only a very small percentage of memory.
>
> On the other hand, ECC memory and caches do this in hardware.

Isn't it better to double check (especially after such risky things as
e.g. software suspend)?

We need to check not only for damaged hardware, but also for
kernel/modules bugs. For this ECC and cache reliability is useless.

--
Victor Porton (porton@xxxxxxxxxxx) - http://porton.ex-code.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/