Re: [PATCH] limit CPU time spent in kipmid
From: Corey Minyard
Date: Fri Mar 20 2009 - 15:28:48 EST
Greg KH wrote:
On Fri, Mar 20, 2009 at 10:30:45AM -0500, Corey Minyard wrote:
Greg KH wrote:
On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
Martin, thanks for the patch. I had actually implemented something like
this before, and it didn't really help very much with the hardware I had,
so I had abandoned this method. There's even a comment about it in
si_sm_result smi_event_handler(). Maybe making it tunable is better, I
don't know. But I'm afraid this will kill performance on a lot of
systems.
Did you test throughput on this? The main problem people had without
kipmid was that things like firmware upgrades took a *long* time; adding
kipmid improved speeds by an order of magnitude or more.
It's my opinion that if you want this interface to work efficiently with
good performance, you should design the hardware to be used efficiently
by using interrupts (which are supported and disable kipmid). With the
way the hardware is defined, you cannot have both good performance and
low CPU usage without interrupts.
It may be possible to add an option to choose between performance and
efficiency, but it will have to default to performance.
I would think that very infrequent things, like firmware upgrades, would
not take priority over a long-term "keep the cpu busy" type system, like
what we currently have.
Is there any way to switch between the different modes dynamically?
I like the idea of this change, as I have got a lot of complaints lately
about kipmi taking way too much cpu time up on idle systems, messing up
some user's process accounting rules in their management systems. But I
worry about making it a module parameter, why can't this be a
"self-tunable" thing?
It's actually already sort of self-tuning. kipmid sleeps unless there is
IPMI activity. It only spins if it is expecting something from the
controller.
I've been thinking about this a little more. Assuming that the self-tuning
is working (and it appears to be working fine on my systems), that means
that something is causing the IPMI driver to constantly talk to the
management controller. I can think of three things:
1. The user is constantly sending messages to management controller.
2. There is something wrong with the hardware, like the ATTN bit is
stuck high, causing the driver to constantly poll the management
controller.
3. The driver either has a bug or needs some more work to account for
something the hardware needs it to do to clear the ATTN bit.
If it's #1 above, then I don't know if there is anything we can do about
it. The patch Martin sent will simply slow things down.
Does the "normal" ipmi userspace tools do #1?
That depends how they are used and configured. If you make them
constantly poll for events or grab sensor values, then they will just
use CPU. By default they shouldn't do anything.
For #2, this might make sense, as I have had reports of some hardware
working just fine, while others have the load issue. Both were
different hardware manufacturers.
#2 and #3 will require someone to do some debugging. If the ATTN bit is
stuck, you should see the "attentions" field in /proc/ipmi/0/si_stats
constantly going up. Actually, the contents of that file would be helpful,
along with /proc/ipmi/0/stats.
Martin has one of these machines, right? If not, I can dig and try to
get some information as well.
I'll wait for Martin, hopefully he can get the info.
Thanks,
-corey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/