Re: Aerospace and linux

From: Brian Gordon
Date: Thu Jun 10 2010 - 14:38:16 EST


> It's also a serious consideration for standard servers.
Yes. Good point.

> On server class systems with ECC memory hardware does that.

> Normally server class hardware handles this and the kernel then reports
> memory errors (e.g. through mcelog or through EDAC)

Agreed. EDAC is a good and sane solution and most companies do this.
Some do not due to naivity or cost reduction. EDAC doesn't cover
processor registers and I have fairly good solutions on how to deal
with that in tiny "home-grown" tasking systems.

On the more exotic end, I have also seen systems that have dual
redundant processors / memories. Then they add compare logic between
the redundant processors that compare most pins each clock cycle. If
any pins are not identical at a clock cycle, then something has gone
wrong (SEU, hardware failure, etc..)

> Lower end systems which are optimized for cost generally ignore the
> problem though and any flipped bit in memory will result
> in a crash (if you're lucky) or silent data corruption (if you're unlucky)

Right! And this is the area that I am interested in. Some people
insist on lowering the cost of the hardware without considering these
issues. One thing I want to do is to be as diligent as possible (even
in these low cost situations) and do the best job I can in spite of
the low cost hardware.

So, some pages of RAM are going to be read-only and the data in those
pages came from some source (file system?). Can anyone describe a
high level strategy to occasionaly provide some coverage of this data?

So far I have thought about page descriptors adding an MD5 hash
whenever they are read-only and first being "loaded/mapped?" and then
a background daemon could occasionaly verify. Does tripwire
accomplish this kind of detection by monitoring the underlying
filesystem (I dont think so)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/