Re: [PATCH 1/2] x86/mce: Include the PPIN in machine check records when it is available

From: Luck, Tony
Date: Fri Nov 18 2016 - 11:41:53 EST


On Fri, Nov 18, 2016 at 02:00:22PM +0100, Borislav Petkov wrote:
> On Thu, Nov 17, 2016 at 04:35:48PM -0800, Luck, Tony wrote:
> > @@ -2134,8 +2140,37 @@ static int __init mcheck_enable(char *str)
> > }
> > __setup("mce", mcheck_enable);
> >
> > +static void mcheck_intel_ppin_init(void)
>
> So this functionality could all be moved to arch/x86/kernel/cpu/intel.c
> where you could set an artificial X86_FEATURE_PPIN and get rid of the
> have_ppin var.

Ok - will do.

> > + switch (boot_cpu_data.x86_model) {
> > + case INTEL_FAM6_IVYBRIDGE_X:
> > + case INTEL_FAM6_HASWELL_X:
> > + case INTEL_FAM6_BROADWELL_XEON_D:
> > + case INTEL_FAM6_BROADWELL_X:
> > + case INTEL_FAM6_SKYLAKE_X:
> > + if (rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl))
> > + return;
>
> I don't think you need to check models - if the RDMSR fails, you're
> done.

Other models may use this MSR number for some other purpose. So the
read might succeed, but what I get might be something else entirely.
Technically with the model check I shouldn't have to use the _safe
versions ... but I'm paranoid that some SKUs might not implement this.

> > + if (msr_ppin_ctl == 1) {
>
> & BIT_ULL(0)
>
> for future robustness in case those other reserved bits get used.

Unlikely ... but paranoia is good (see above about using rdmsr_safe).

> > + pr_info("PPIN available but disabled\n");
>
> We don't care, do we?

Probably not ... there might be a BIOS setting, but the user that
finds they aren't getting PPIN in their logs could diagnose by making
their own rdmsr checks ... will delete this pr_info().

> > + return;
> > + }
> > + /* if PPIN is disabled, but not locked, try to enable */
> > + if (msr_ppin_ctl == 0) {
>
> Also, properly masked off. There are [63:2] reserved bits which might be
> assigned someday.

Ok.

> > + wrmsrl_safe(MSR_PPIN_CTL, 2);
> > + rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl);
>
> Why aren't we programming a number here? Or are users supposed to do
> that?
>
> If so, please design a proper sysfs interface and not make them use
> msr-tools.

The PPIN is programmed at the fab. To the user it is just a handy
unique number. I think Intel can decode it back to which fab and
production run this chip came from (useful to us if there are many
chips reporting some error).

> > + }
> > + if (msr_ppin_ctl == 2)
> > + have_ppin = 1;
>
> set_cpu_cap(c, X86_FEATURE_PPIN);

Yes - that looks prettier.

Thanks

-Tony