Re: [PATCH] limit CPU time spent in kipmid

From: Corey Minyard
Date: Fri Mar 20 2009 - 15:28:48 EST


Greg KH wrote:
On Fri, Mar 20, 2009 at 10:30:45AM -0500, Corey Minyard wrote:
Greg KH wrote:
On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
Martin, thanks for the patch. I had actually implemented something like this before, and it didn't really help very much with the hardware I had, so I had abandoned this method. There's even a comment about it in si_sm_result smi_event_handler(). Maybe making it tunable is better, I don't know. But I'm afraid this will kill performance on a lot of systems.

Did you test throughput on this? The main problem people had without kipmid was that things like firmware upgrades took a *long* time; adding kipmid improved speeds by an order of magnitude or more.

It's my opinion that if you want this interface to work efficiently with good performance, you should design the hardware to be used efficiently by using interrupts (which are supported and disable kipmid). With the way the hardware is defined, you cannot have both good performance and low CPU usage without interrupts.

It may be possible to add an option to choose between performance and efficiency, but it will have to default to performance.
I would think that very infrequent things, like firmware upgrades, would
not take priority over a long-term "keep the cpu busy" type system, like
what we currently have.

Is there any way to switch between the different modes dynamically?
I like the idea of this change, as I have got a lot of complaints lately
about kipmi taking way too much cpu time up on idle systems, messing up
some user's process accounting rules in their management systems. But I
worry about making it a module parameter, why can't this be a
"self-tunable" thing?
It's actually already sort of self-tuning. kipmid sleeps unless there is IPMI activity. It only spins if it is expecting something from the controller.

I've been thinking about this a little more. Assuming that the self-tuning is working (and it appears to be working fine on my systems), that means that something is causing the IPMI driver to constantly talk to the management controller. I can think of three things:

1. The user is constantly sending messages to management controller.
2. There is something wrong with the hardware, like the ATTN bit is
stuck high, causing the driver to constantly poll the management
controller.
3. The driver either has a bug or needs some more work to account for
something the hardware needs it to do to clear the ATTN bit.

If it's #1 above, then I don't know if there is anything we can do about it. The patch Martin sent will simply slow things down.

Does the "normal" ipmi userspace tools do #1?
That depends how they are used and configured. If you make them constantly poll for events or grab sensor values, then they will just use CPU. By default they shouldn't do anything.

For #2, this might make sense, as I have had reports of some hardware
working just fine, while others have the load issue. Both were
different hardware manufacturers.

#2 and #3 will require someone to do some debugging. If the ATTN bit is stuck, you should see the "attentions" field in /proc/ipmi/0/si_stats constantly going up. Actually, the contents of that file would be helpful, along with /proc/ipmi/0/stats.

Martin has one of these machines, right? If not, I can dig and try to
get some information as well.
I'll wait for Martin, hopefully he can get the info.

Thanks,

-corey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/