Re: [PATCH] x86: auto poll/interrupt mode switch for CMC to stopCMC storm

From: Thomas Gleixner
Date: Thu May 24 2012 - 06:01:21 EST


On Thu, 24 May 2012, Borislav Petkov wrote:

> On Thu, May 24, 2012 at 10:23:38AM +0800, Chen Gong wrote:
> > Hi, Boris, when I write these codes I don't care if it is specific for
> > Intel or AMD.
>
> Well, but I do care so that when you leave and start doing something
> else, people after you can still read and maintain that code.
>
> > I just noticed it should be general for x86 platform and all related
> > codes are general too, which in mce.c, so I think it should be fine to
> > place the codes in mce.c.
>
> Are you kidding me? Only Intel has CMCI.
>
> Now, if some other vendor needs correctable errors interrupt rate
> throttling, they can carve it out, make it generic, and move it to mce.c.
>
> Otherwise, it belongs in mce_intel.c. For the same reason AMD error
> thresholding code belongs to mce_amd.c.

Aside of that machine_check_poll is called from other places as
well. So looking at mce_timer_start() which is surprisingly the timer
callback:

The poll timer rate is self adjusting to intervals down to HZ/100. So
when you get into a state where the timer rate becomes lower than HZ/5
we'll trigger that CMCI storm in software and queue work even on
machines which do not support CMCI or have it disabled. Brilliant,
isn't it?

So that rate check belongs into intel_treshold_interrupt() and wants a
intel specific callback in mce_start_timer() to undo it.

Thanks,

tglx







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/