Re: [PATCH] x86, amd, mce: Prevent potential cpu-online oops

From: Steffen Persvold
Date: Thu Apr 04 2013 - 16:01:24 EST


On 4/4/2013 9:07 PM, Borislav Petkov wrote:
On Thu, Apr 04, 2013 at 08:05:46PM +0200, Steffen Persvold wrote:
It made more sense (to me) to skip the creation of MC4 all together
if you can't find the matching northbridge since you can't reliably
do the dec_and_test() reference counting on the shared bank when you
don't have the common NB struct for all the shared cores.

Or am I just smoking the wrong stuff ?

No, actually *this* explanation should've been in the commit message.
You numascale people do crazy things with the hardware :) so explaining
yourself more verbosely is an absolute must if anyone is to understand
why you're changing the code.

Ok :)


So please write a detailed commit message why you need this change,
don't be afraid to talk about the big picture.

Will do.


Also, I'm guessing this is urgent stuff and it needs to go into 3.9?
Yes, no? If yes, this patch should probably be tagged for stable.

Yes. We found the issue on -stable at first (3.8.2 iirc) because it doesn't have the multi-domain support we needed (which is added in 3.9).


Also, please redo this patch against tip:x86/ras which already has
patches touching mce_amd.c.

Ok.


Oh, and lastly, needless to say, it needs to be tested on a "normal",
i.e. !numascale AMD multinode box, in case you haven't done so yet. :-)


It has been tested on "normal" platforms and NumaConnect platforms (Fam10h and Fam15h AMD processors, SCM and MCM versions).

Cheers,
Steffen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/