Re: [PATCH 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
From: Huang, Kai
Date: Wed Jul 30 2025 - 07:58:13 EST
On Wed, 2025-07-30 at 13:54 +0300, Adrian Hunter wrote:
> But there are also additional places where it seems like MCI_ADDR_PHYSADDR
> is missing:
>
> tdx_dump_mce_info()
> paddr_is_tdx_private()
> __seamcall_ret(TDH_PHYMEM_PAGE_RDMD, &args)
> TDH_PHYMEM_PAGE_RDMD expects KeyID bits to be zero
This is only called in mce_panic() path, which basically means the #MC is
fatal, e.g., happens in kernel context.
The intention of this is to catch any #MC due to kernel bug (i.e.,
software issue, but not hardware error) which does partial write to TDX
private memory and read at a later time, and report a more precise
information to the user to point out this could be due to "possible kernel
bug". See changelog of 70060463cb2b ("x86/mce: Differentiate real
hardware #MCs from TDX erratum ones").
In other words, for this case the address reported via MCI_ADDR_PHYSADDR
should not contain any KeyID bits since the kernel always uses keyID 0 to
read.
I believe the KeyID bits will only be appended to the physical address
reported in MCI_ADDR_PHYSADDR when the #MC was triggered from TDX guest,
i.e., when the CPU was accessing memory using TDX KeyID. Such #MC is not
fatal and won't call into mce_panic().
That being said, for tdx_dump_mce_info(), while explicitly masking out
keyID bits in MCI_ADDR_PHYSADDR obviously doesn't hurt (or arguably better
in some way), it is not necessary AFAICT.