Re: [RT] [RFC] simple SMI detector

From: Sven-Thorsten Dietrich
Date: Sat Jan 24 2009 - 21:13:42 EST

On Sat, 2009-01-24 at 19:57 -0500, Jon Masters wrote:
> On Sat, 2009-01-24 at 17:30 +0100, Thomas Gleixner wrote:
> > On Fri, 23 Jan 2009, Lee Revell wrote:
> > >
> > > FYI the RTAI project has a patch to allow disabling of SMI:
> > >
> > >
> >
> > FYI, this patch should be removed from the internet archives and the
> > author should be made liable for the damage which can happen when
> > innocent users try to use it for fixing things which simply can not be
> > fixed.

(given an arbitrary existing hardware platform)

Note however, that in many cases, it is possible - by system vendors -
to modify / update firmware and BIOS implementations.

> Well, we disagree on the remedy there (I personally thing "caveat
> emptor", although European law might well differ in that regard), but I
> think a warning is very much called for, especially where end users are
> concerned. Let's be sure those following along the LKML archives
> understand something:
> 1). You (the end user) don't want to disable SMIs on existing systems.
> 2). Doing so might prevent thermal management software, system
> management software, BIOS implemented fixes, and other components from
> operating.
> 3). BIOS vendors might need to be encouraged to reduce SMI overhead.
> Once systems have been designed to rely on lots of SMI overhead, the
> game has already been lost. No amount of user fiddling will adequately
> fix that.

4. Disabling SMIs and mucking with BIOS code may void your warranty,
cause your circuit breakers to trip, and could interfere with the
operation of your toaster.

> What we are interested in doing with things like smi_detector is
> noticing when there is substantial overhead, so that we can politely ask
> the vendors involved to implement alternative SMI handlers that have
> less of a hit on RT performance - the only real solution involving no
> SMIs is more expensive hardware with dedicated system management/thermal
> management processors.

Some SMI implementations involve dedicated, "embedded" auxiliary
processors and associated firmware,
others simply exist as BIOS-level implementations.

When the effect of SMI execution governs worst-case task-level or IRQ
latencies on a hardware platform, you may indeed be using the wrong

Hardware is subject to the market-driven process, admittedly in some
cases costly, cumbersome and difficult for those constrained by existing
installations and supply-side arrangements.

Latency-friendly SMI implementations may be found in some systems, and
not so much in others.

It should be up to the promotional divisions of the particular hardware
suppliers to assess the market pressure, and to report the performance
advantages of their specific systems.

Worst-case cpu-offlining and other execution-inhibiting system-level
latencies should be required as part of RFP by customers of systems
targeted for real-time, low-latency, and HPC applications.

I envision documentation of these specs, in a similar fashion to the
manner in which CPU clock-cycles are documented for specific instruction
executions - for those systems eligible (certified?) for low-latency

Broader SMP arrays combined with leveling-off CPU speed, cycle
management seems to be moving back to the fore-front on systems
architecture and design.

> > >> .... disabled SMIs with RTAI code the latencies of cyclictest were
> > >> good, but after about 20 minutes my system stopped working and now
> > >> it does not even boot anymore. :( Any hint?
> > >
> > > Yeah. Buy a new CPU. You deep-fried your P4 because you disabled the
> > > thermal protection.
> >
> > Aside of this worst case scenario, disabling SMIs is problematic as
> > SMI is used to emulate devices, to fix chip bugs and necessary for
> > enhanced features.
> >
> > The only reasonable thing you can do on a SMI plagued system is to
> > identify the device which makes use of SMIs. Legacy ISA devices and
> > USB are usually good candidates. If that does not help, don't use it
> > for real-time :)
> Indeed. This is why I wrote an smi_detector that sits in kernel space
> and can be reasonably sure measured discrepancies are attributable to
> SMI behavior. We want to log and detect such things before RT systems
> are deployed, not have users actively trying to work around SMI overhead
> after the fact.

Enterprise Distros (IMO) bear the responsibility to raise awareness of
these issues, and stand firm to relentlessly call for open specs and
open implementation.

Providing benchmarking tools is definitely an excellent step in that
direction (imo) - thanks.



> Jon.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at