Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison

From: Jue Wang
Date: Wed Apr 14 2021 - 01:47:44 EST


On Tue, 13 Apr 2021 12:07:22 +0200, Petkov, Borislav wrote:

>> KVM apparently passes a machine check into the guest.

> Ah, there it is:

> static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *tsk)
> {
> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, tsk);
> }

This path is when EPT #PF finds accesses to a hwpoisoned page and
sends SIGBUS to user space (KVM exits into user space) with the same
semantic as if regular #PF found access to a hwpoisoned page.

The KVM_X86_SET_MCE ioctl actually injects a machine check into the guest.

We are in process to launch a product with MCE recovery capability in
a KVM based virtualization product and plan to expand the scope of the
application of it in the near future.

> So what I'm missing with all this fun is, yeah, sure, we have this
> facility out there but who's using it? Is anyone even using it at all?

The in-memory database and analytical domain are definitely using it.
A couple examples:
SAP HANA - as we've tested and planned to launch as a strategic
enterprise use case with MCE recovery capability in our product
SQL server - https://support.microsoft.com/en-us/help/2967651/inf-sql-server-may-display-memory-corruption-and-recovery-errors


Cheers,
-Jue