Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed

From: Jane Chu
Date: Tue May 07 2024 - 13:54:36 EST


On 5/7/2024 2:02 AM, Oscar Salvador wrote:

On Wed, May 01, 2024 at 05:24:56PM -0600, Jane Chu wrote:
For years when it comes down to kill a process due to hwpoison,
a SIGBUS is delivered only if unmap has been successful.
Otherwise, a SIGKILL is delivered. And the reason for that is
to prevent the involved process from accessing the hwpoisoned
page again.

Since then a lot has changed, a hwpoisoned page is marked and
upon being re-accessed, the process will be killed immediately.
So let's take out the '!unmap_success' factor and try to deliver
SIGBUS if possible.
I am missing some details here.
An unmapped hwpoison page will trigger a fault and will return
VM_FAULT_HWPOISON all the way down and then deliver SIGBUS,
but if the page was not unmapped, how will this be catch upon
re-accessing? Will the system deliver a MCE event?

I actually managed to hit the re-access case with an older version of Linux -

MCE occurred, but unmap failed,  no SIGBUS and test process re-access

the same address over and over (hence MCE after MCE), as the CPU

was unable to make forward progress.   In reality, this issue is fixed with

kill_accessing_processes().  The comment for this patch refers to comment made

about '!unmap_access' long time ago.

thanks,

-jane