Re: [PATCH 1/2] x86/mce: Include the PPIN in machine check records when it is available

From: Borislav Petkov
Date: Fri Nov 18 2016 - 08:00:31 EST


On Thu, Nov 17, 2016 at 04:35:48PM -0800, Luck, Tony wrote:
> From: Tony Luck <tony.luck@xxxxxxxxx>
>
> Intel Xeons from Ivy Bridge onwards support a processor identification
> number. On systems that have it, include it in the machine check record.
> I'm told that this would be helpful for users that run large data centers
> with multi-socket servers to keep track of which CPUs are seeing errors.
>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> ---
> arch/x86/include/asm/msr-index.h | 4 ++++
> arch/x86/include/uapi/asm/mce.h | 1 +
> arch/x86/kernel/cpu/mcheck/mce.c | 35 +++++++++++++++++++++++++++++++++++
> 3 files changed, 40 insertions(+)

...

> @@ -2134,8 +2140,37 @@ static int __init mcheck_enable(char *str)
> }
> __setup("mce", mcheck_enable);
>
> +static void mcheck_intel_ppin_init(void)

So this functionality could all be moved to arch/x86/kernel/cpu/intel.c
where you could set an artificial X86_FEATURE_PPIN and get rid of the
have_ppin var.

> +{
> + unsigned long long msr_ppin_ctl;
> +
> + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> + return;

Then, that check can go.

> + switch (boot_cpu_data.x86_model) {
> + case INTEL_FAM6_IVYBRIDGE_X:
> + case INTEL_FAM6_HASWELL_X:
> + case INTEL_FAM6_BROADWELL_XEON_D:
> + case INTEL_FAM6_BROADWELL_X:
> + case INTEL_FAM6_SKYLAKE_X:
> + if (rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl))
> + return;

I don't think you need to check models - if the RDMSR fails, you're
done.

> + if (msr_ppin_ctl == 1) {

& BIT_ULL(0)

for future robustness in case those other reserved bits get used.

> + pr_info("PPIN available but disabled\n");

We don't care, do we?

> + return;
> + }
> + /* if PPIN is disabled, but not locked, try to enable */
> + if (msr_ppin_ctl == 0) {

Also, properly masked off. There are [63:2] reserved bits which might be
assigned someday.

> + wrmsrl_safe(MSR_PPIN_CTL, 2);
> + rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl);

Why aren't we programming a number here? Or are users supposed to do
that?

If so, please design a proper sysfs interface and not make them use
msr-tools.

> + }
> + if (msr_ppin_ctl == 2)
> + have_ppin = 1;

set_cpu_cap(c, X86_FEATURE_PPIN);

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix ImendÃrffer, Jane Smithard, Graham Norton, HRB 21284 (AG NÃrnberg)
--