Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake

From: Thomas Gleixner
Date: Thu Jun 07 2018 - 16:25:08 EST



On Thu, 7 Jun 2018, Dan Williams wrote:

> On Thu, Jun 7, 2018 at 10:43 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> > On Fri, May 25, 2018 at 02:42:09PM -0700, Tony Luck wrote:
> >> Currently we just check the "CAPID0" register to see whether the CPU
> >> can recover from machine checks.
> >>
> >> But there are also some special SKUs which do not have all advanced
> >> RAS features, but do enable machine check recovery for use with NVDIMMs.
> >>
> >> Add a check for any of bits {8:5} in the "CAPID5" register (each
> >> reports some NVDIMM mode available, if any of them are set, then
> >> the system supports memory machine check recovery).
> >>
> >> Cc: stable@xxxxxxxxxxxxxxx # 4.9
> >> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> >> ---
> >
> > Has this stalled somewhere? I'd like to see this one go into the
> > 4.18 merge because it unbreaks some real hardware.
> >
> > Parts 1 & 2 are nice-to-have, but they just make for better error
> > messages so aren't as critical.
>
> I'm making an effort to get all persistent memory error handling holes
> covered this cycle, so I think it makes sense for this to go through
> the nvdimm tree. This looks sufficiently non-controversial that I
> could justify sending it to Linus along with the other pmem updates.

I've picked it up already and please can we let stuff go through the right
trees? The worlds does not stop turning if a fix goes in 2 days later.

Thanks,

tglx