Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

From: Aili Yao
Date: Mon Feb 22 2021 - 21:29:04 EST


On Mon, 22 Feb 2021 13:45:50 +0100
Borislav Petkov <bp@xxxxxxxxx> wrote:

> On Mon, Feb 22, 2021 at 08:35:49PM +0800, Aili Yao wrote:
> > Guest VM, the qemu has no way to know the RIPV value, so always get it
> > cleared.
>
> What does that mean?
>
> The guest VM will get the MCE signature it gets from the host kernel so
> the host kernel most definitely knows the RIPV value.

When Guest access one address with UE error, it will exit guest mode, the host
will do the recovery job, and then one SIGBUS is send to the VCPU and qemu will
catch the signal, there is only address and error level no RIPV in signal, so qemu will
assume RIPV is cleared and inject the error into guest OS.

> It looks like you're testing how guests will handle MCEs which the host
> has caught and wants to inject into the guest for further handling. What
> is your exact use case? Please explain in detail how I can reproduce it
> step-by-step locally.

Yeah, there are multiple steps i do:
1. One small test code in guest OS access one address A which will be injected UC error,
the address will be logged, and use vtop you can get the guest physical address.

2. Using "virsh qemu-monitor-command guest --hmp gpa2hvagpa2hva 0xxxxxx" to get the user
virtual address,

3. Using vtop you can get host physical address from the above user address.

4. Inject 0x10 level error using einj module.

5. then when guest access the address, you will see what happens.

Please using latest upstream kernel for guest OS, and you may change monarch_timeout to a bigger
value, or you will see other issues not only talked one.

Tks

Best Regards!
Aili Yao