Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

From: Luck, Tony
Date: Thu Jan 28 2021 - 12:45:21 EST


On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> when one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> mode it may be fixed by fixup_exception, current code will just return
> error code to user process.

Shouldn't the AO action that poisoned the page have also unmapped it?
>
> This is not suffient, we should send a SIGBUS to the process and log the
> info to console, as we can't trust the process will handle the error
> correctly.

I agree with this part ... few apps check for -EFAULT and do the right
thing. But I'm not sure how this happens. Can you provide a bit more
detail on the steps

-Tony

P.S. Typo: s/suffient/sufficient/

>
> Suggested-by: Feng Yang <yangfeng1@xxxxxxxxxxxx>
> Signed-off-by: Aili Yao <yaoaili@xxxxxxxxxxxx>
> ---
> arch/x86/mm/fault.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f1f1b5a0956a..36d1e385512b 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -662,7 +662,16 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> * In this case we need to make sure we're not recursively
> * faulting through the emulate_vsyscall() logic.
> */
> +#ifdef CONFIG_MEMORY_FAILURE
> + if (si_code == BUS_MCEERR_AR && signal == SIGBUS)
> + pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> + current->comm, current->pid, address);
> +
> + if ((current->thread.sig_on_uaccess_err && signal) ||
> + (si_code == BUS_MCEERR_AR && signal == SIGBUS)) {
> +#else
> if (current->thread.sig_on_uaccess_err && signal) {
> +#endif
> sanitize_error_code(address, &error_code);
>
> set_signal_archinfo(address, error_code);
> @@ -927,7 +936,14 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
> {
> /* Kernel mode? Handle exceptions or die: */
> if (!(error_code & X86_PF_USER)) {
> +#ifdef CONFIG_MEMORY_FAILURE
> + if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
> + no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
> + else
> + no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> +#else
> no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> +#endif
> return;
> }
>
> --
> 2.25.1
>