Re: Aerospace and linux

From: Chris Friesen
Date: Thu Jun 10 2010 - 14:30:16 EST


On 06/10/2010 11:29 AM, Brian Gordon wrote:

> When these SEU can be detected some action may be taken to improve
> the behaviour of the system (log a fault and reset in order to
> refresh things from scratch?). So the first question becomes how to
> detect an SEU.

I do work in telco stuff. We use ECC RAM, turn on ECC/parity on the
various buses, enable error-checking in the hardware, etc.

At higher abstraction levels you can checksum the data being stored and
validate it when you access it.

Some of the errors are "soft" and can be corrected, others are "hard"
and uncorrectable. If you get enough "soft" errors in a short enough
time it may be desirable to treat it as a "hard" error and reset.

> Thank you to anyone for any pointers on where I can look to learn
> more about detecting SEU in linux.

You might start by taking a look at the "edac" code in the kernel.
Linux in general doesn't normally enable all the fault detection code,
so you may need to start looking at datasheets.

Chris

--
The author works for GENBAND Corporation (GENBAND) who is solely
responsible for this email and its contents. All enquiries regarding
this email should be addressed to GENBAND. Nortel has provided the use
of the nortel.com domain to GENBAND in connection with this email solely
for the purpose of connectivity and Nortel Networks Inc. has no
liability for the email or its contents. GENBAND's web site is
http://www.genband.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/