Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison

From: Jue Wang
Date: Mon Apr 19 2021 - 16:35:24 EST


On Thu, 8 Apr 2021 10:08:52 -0700, Tony Luck wrote:
> KVM apparently passes a machine check into the guest. Though it seems
> to be misisng the MCG_STATUS information to tell the guest whether this
> is an "Action Required" machine check, or an "Action Optional" (i.e.
> whether the poison was found synchonously by execution of the current
> instruction, or asynchronously).

The KVM_X86_SET_MCE ioctl takes a parameter of struct kvm_x86_mce, hypervisor
can set with necessary semantics.

1140 #ifdef KVM_CAP_MCE
1141 /* x86 MCE */
1142 struct kvm_x86_mce {
1143 __u64 status;
1144 __u64 addr;
1145 __u64 misc;
1146 __u64 mcg_status;
1147 __u8 bank;
1148 __u8 pad1[7];
1149 __u64 pad2[3];
1150 };
1151 #endif

> > Are we documenting somewhere: "if your process gets a SIGBUS and this
> > and that, which means your page got offlined, you should do this and
> > that to recover"?

> Essentially it boils down to:
> SIGBUS handler gets additional data giving virtual address that has gone away

> 1) Can the application replace the lost page?
> Use mmap(addr, MAP_FIXED, ...) to map a fresh page into the gap
> and fill with replacement data. This case can return from SIGBUS
> handler to re-execute failed instruction
> 2) Can the application continue in degraded mode w/o the lost page?
> Hunt down pointers to lost page and update structures to say
> "this data lost". Use siglongjmp() to go to preset recovery path
> 3) Can the application shut down gracefully?
> Record details of the lost page. Inform next-of-kin. Exit.
> 4) Default - just exit
Two possible addition to these great points:
5) If for some reason the page cannot be unmapped (e.g.,
either losing to much memory like hugetlbfs 1G pages, or
THP split failure for SHMEM THP), kernel maintains a
consistent semantic (i.e., MCE SIGBUS with vaddr) to all future
accesses from user space, by leaving the hwpoisoned page
mapped or in the radix tree.
6). If for some reason the vaddr is not available upon the
first MCE recovery and page is unmapped, kernel provides
correct semantic (MCE SIGBUS with vaddr) in subsequent
page faults from user space accessing the same vaddr.