Re: [PATCH v4] cper, apei, mce: Pass x86 CPER through the MCA handling chain

From: Yazen Ghannam
Date: Fri Sep 25 2020 - 12:20:06 EST


On Fri, Sep 25, 2020 at 09:54:06AM +0900, Punit Agrawal wrote:
> Borislav Petkov <bp@xxxxxxxxx> writes:
>
> > On Thu, Sep 24, 2020 at 12:23:27PM -0500, Smita Koralahalli Channabasappa wrote:
> >> > Even though it's not defined in the UEFI spec, it doesn't mean a
> >> > structure definition cannot be created.
> >
> > Created for what? That structure better have a big fat comment above it, what
> > firmware generates its layout.
>
> Maybe I could've used a better choice of words - I meant to define a
> structure with meaningful member names to replace the *(ptr + i)
> accesses in the patch.
>
> The requirement for documenting the record layout doesn't change -
> whether using raw pointer arithmetic vs a structure definition.
>
> >> > After all, the patch is relying on some guarantee of the meaning of
> >> > the values and their ordering.
> >
> > AFAICT, this looks like an ad-hoc definition and the moment they change
> > it in some future revision, that struct of yours becomes invalid so we'd
> > need to add another one.
>
> If there's no spec backing the current layout, then it'll indeed be an
> ad-hoc definition of a structure in the kernel. But considering that
> it's part of firmware / OS interface for an important part of the RAS
> story I would hope that the code is based on a spec - having that
> reference included would help maintainability.
>
> Incompatible changes will indeed break the assumptions in the kernel and
> code will need to be updated - regardless of the choice of kernel
> implementation; pointer arithmetic, structure definition - ad-hoc or
> spec provided.
>
> Having versioning will allow running older kernels on newer hardware and
> vice versa - but I don't see why that is important only when using a
> structure based access.
>

There is no versioning option for the x86 context info structure in the
UEFI spec, so I don't think there'd be a clean way to include version
information.

The format of the data in the context info is not totally ad-hoc, and it
does follow the UEFI spec. The "Register Array" field is raw data. This
may follow one of the predefined formats in the UEFI spec like the "X64
Register State", etc. Or, in the case of MSR and Memory Mapped
Registers, this is a raw dump of the registers starting from the address
shown in the structure. The two values that can be changed are the
starting address and the array size. These two together provide a window
to the registers. The registers are fixed, so a single context info
struture should include a single contiguous range of registers. Multiple
context info structures can be provided to include registers from
different, non-contiguous ranges.

This patch is checking if an MSR context info structure lines up with
the MCAX register space used on Scalable MCA systems. This register
space is defined in the AMD Processor Programming Reference for various
products. This is considered a hardware feature extension, so the
existing register layout won't change though new registers may be added.
A layout change would require moving to another register space which is
what happened going from legacy MCA (starting at address 0x400) to MCAX
(starting at address 0xC0002000) registers.

The only two things firmware can change are from what address does the
info start and where does the info end. So the implementation-specific
details here are that currently the starting address is MCA_STATUS (in
MCAX space) for a bank and the remaining info includes the other MCA
registers for this bank.

So I think the kernel can be strict with this format, i.e. the two
variables match what we're looking for. This patch already has a check
on the starting address. It should also include a check that "Register
Array Size" is large enough to include all the registers we want to
extract. If the format doesn't match, then we fall back to a raw dump
of the data like we have today.

Or the kernel can be more flexible and try to find the window of
registers based on the starting address. I think this is really
open-ended though.

Does this sound reasonable?

Thanks,
Yazen