Re: [PATCH] x86/CPU/AMD: Ignore invalid reset reason value

From: Mario Limonciello
Date: Thu Jul 24 2025 - 17:02:57 EST


On 7/24/2025 3:58 PM, Sean Christopherson wrote:
On Wed, Jul 23, 2025, Borislav Petkov wrote:
On July 23, 2025 9:34:26 PM GMT+03:00, Yazen Ghannam <yazen.ghannam@xxxxxxx> wrote:
On Tue, Jul 22, 2025 at 06:56:15PM +0200, Borislav Petkov wrote:
On Mon, Jul 21, 2025 at 06:11:54PM +0000, Yazen Ghannam wrote:
The reset reason value may be "all bits set", e.g. 0xFFFFFFFF. This is a
commonly used error response from hardware. This may occur due to a real
hardware issue or when running in a VM.

Well, which is it Libing is reporting? VM or a real hw issue?


In this case, it was a VM.

If it is a VM, is that -1 the only thing a VMM returns when reading that
MMIO address or can it be anything?

If latter, you need to check X86_FEATURE_HYPERVISOR.

Same for a real hw issue.

IOW, is -1 the *only* invalid data we can read here or are we playing
whack-a-mole with it?


I see you're point, but I don't think we can know for sure all possible
cases. There are some reserved bits that shouldn't be set. But these
definitions could change in the future.

And it'd be a pain to try and verify combinations of bits and configs.
Like can bit A and B be set together, or can bit C be set while running
in a VM, or can bit D ever be set on Model Z?

The -1 (all bits set) is the only "applies to all cases" invalid data,
since this is a common hardware error response. So we can at least check
for this.

Thanks,
Yazen

I think you should check both: HV or -1.

HV covers the VM angle as they don't emulate this

You can't possibly know that. If there exists a hardware spec of any kind, it's
fair game for emulation.

and we simply should disable this functionality when running as a guest.

-1 covers the known-bad hw value.

And in a guest, -1, i.e. 0xffffffff is all but guaranteed to come from the VMM
providing PCI master abort semantics for reads to MMIO where no device exists.
That's about as "architectural" of behavior as you're going to get, so I don't
see any reason to assume no VMM will every emulate whatever this feature is.

I don't really understand why there would be any value in a VMM emulating this feature. It's specifically about the reason the hardware saw for the last reboot. Those reasons are *hardware reasons*. IE, you're never going to see a thermal event as the reason a guest was rebooted.

CF9 reset or ACPI power state transition are about all I can envision for guest reboot reasons. And even then do you *want* the to really have the VMM track the reasons for a guest reboot?