[RFC] How drivers notice a HW error?
From: Hidetoshi Seto
Date: Thu Nov 27 2003 - 03:31:24 EST
This is a request for comments, especially comments from driver developers.
On some platform, for example IA64, the chipset detects an error caused by
driver's operation such as I/O read, and reports it to kernel. Linux kernel
analyzes the error and decides to kill the driver or reboot at worst.
I want to convey the error information to the offending driver, and want to
enable the driver to recover the failed operation.
So, just a plan, I think about a readb_check function that has checking ability
enable it to return error value if error is occurred on read. Drivers could use
readb_check instead of usual readb, and could diagnosis whether a retry be
required or not, by the return value of readb_check.
To realize this, I consider following two images:
+ readb_check on driver (with Notifier)
- Hardware error handler (for example in IA64, MCA handler) has a Notifier
as hook point.
- Driver may register a hook function to the Notifier.
- Notifier calls over registered functions when error is occurred.
- Called hook function checks address of error, and if the error seems
to be concerned with the parent driver, ups internal error flag and
stops Notifier by returning OK.
- Hardware error handler regards state of Notifier, and decides the system
to resume or not.
- Restarted driver may refer the error flag after read, and may retry the
read if flag is up.
- Some interfaces such as register hooks would be required.
- Coding a hook function would be a bother of developers.
+ readb_check on kernel
- Kernel has readb_check function.
- Drivers may use readb_check instead of usual readb.
- Hardware error handler checks address of error, and if it occurs in
readb_check, changes return value of readb_check and resumes
- Driver may refer the return value to notice an error in last read
- Overhead would be involved. (Possibly, it could say negligible since
I/O reads are already horribly slow.)
IMO, this is a general-purpose function that should be available on many
platforms. I also hear that Solaris has some similar implementations like this.
If you have any comment about this feature or any idea different from this,
please tell me.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/