Re: Re: Re: [PATCH 3/4] mm/memofy-failure.c: optimize hwpoison_filter

From: zhenwei pi
Date: Sat May 07 2022 - 05:23:08 EST


On 5/7/22 16:20, Naoya Horiguchi wrote:
On Sat, May 07, 2022 at 08:28:05AM +0800, zhenwei pi wrote:

On 5/7/22 00:28, David Hildenbrand wrote:
On 06.05.22 15:38, zhenwei pi wrote:


On 5/6/22 16:59, Naoya Horiguchi wrote:
On Fri, Apr 29, 2022 at 10:22:05PM +0800, zhenwei pi wrote:
In the memory failure procedure, hwpoison_filter has higher priority,
if memory_filter() filters the error event, there is no need to do
the further work.

Could you clarify what problem you are trying to solve (what does
"optimize" mean in this context or what is the benefit)?


OK. The background of this work:
As well known, the memory failure mechanism handles memory corrupted
event, and try to send SIGBUS to the user process which uses this
corrupted page.

For the virtualization case, QEMU catches SIGBUS and tries to inject MCE
into the guest, and the guest handles memory failure again. Thus the
guest gets the minimal effect from hardware memory corruption.

The further step I'm working on:
1, try to modify code to decrease poisoned pages in a single place
(mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).

This is fine to me.


2, try to use page_handle_poison() to handle SetPageHWPoison() and
num_poisoned_pages_inc() together. It would be best to call
num_poisoned_pages_inc() in a single place too. I'm not sure if this is
possible or not, please correct me if I misunderstand.

SetPageHWPoison() can be cancelled in memory_failure(), so simply bundling
it with num_poisoned_pages_inc() might not be optimal. I think that
action_result() is supposed to be called when memory error handling is
effective (not filtered, not cancelled). So moving num_poisoned_pages_inc()
(and notification code in your plan) into this function might be good.

OK, I'll remove this patch(mm/memofy-failure.c: optimize hwpoison_filter) from this series, and fix the other 3 patches in the v2 version. Then try to implement/test as your suggestion in another series.


3, introduce memory failure notifier list in memory-failure.c: notify
the corrupted PFN to someone who registers this list.
If I can complete [1] and [2] part, [3] will be quite easy(just call
notifier list after increasing poisoned page).

4, introduce memory recover VQ for memory balloon device, and registers
memory failure notifier list. During the guest kernel handles memory
failure, balloon device gets notified by memory failure notifier list,
and tells the host to recover the corrupted PFN(GPA) by the new VQ.

Most probably you might want to do that asynchronously, and once the
callback succeeds, un-poison the page.

Yes!


5, host side remaps the corrupted page(HVA), and tells the guest side to
unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA)
dynamically.

I think QEMU already does that during reboots. Now it would be triggered
by the guest for individual pages.

Yes, currently QEMU supports to un-poison corrupted pages during
reset/reboot. We can reuse some code to do the work in this case, this
allows a VM to fix corrupted pages as soon as possible(also no need to
reset/reboot).

So this finally allows to replace broken page mapped to guest with
a healthy page without rebooting the guest. That sounds helpful.

Thanks,
Naoya Horiguchi

Yes, it's my plan. Thanks for your suggestions!



Because [4] and [5] are related to balloon device, also CC Michael,
David and Jason.

Doesn't sound too crazy for me, although it's a shame that we always
have to use virtio-balloon for such fairly balloon-unrelated things.

Thanks!

--
zhenwei pi

--
zhenwei pi