Re: [PATCH v2] x86/mce: Fix endless loop when run task works after #MC

From: Ding Hui
Date: Wed Jul 07 2021 - 05:51:40 EST


On 2021/7/7 11:39, Ding Hui wrote:
On 2021/7/7 0:44, Luck, Tony wrote:
On Tue, Jul 06, 2021 at 08:16:06PM +0800, Ding Hui wrote:
Recently we encounter multi #MC on the same task when it's
task_work_run() has not been called, current->mce_kill_me was
added to task_works list more than once, that make a circular
linked task_works, so task_work_run() will do a endless loop.

I saw the same and posted a similar fix a while back:

https://www.spinics.net/lists/linux-mm/msg251006.html

It didn't get merged because some validation tests began failing
around the same time.  I'm now pretty sure I understand what happened
with those other tests.

I'll post my updated version (second patch in a three part series)
later today.


Thanks for your fixes.

After digging my original problem, maybe I find out why I met #MC flood.

My test case:
1. run qemu-kvm guest VM, OS is memtest86+.iso
2. inject SRAR UE to VM memory and wait #MC
When VM trigger #MC, I expect that qemu will receive SIGBUS signal ASAP, and with the modifed qemu, I will kill VM.

In this case, do_machine_check() maybe called by kvm_machine_check() in vmx.c.

Before [1], memory_failure() is called in do_machine_check(), so TIF_SIGPENDING is set on due to SIGBUS signal, vcpu_run() checked the pending singal, so return to qemu to handle SIGBUS.

After [1], do_machine_check() only add task work but not send SIGBUS directly, vcpu_run() will not break the for-loop because vcpu_enter_guest() return 1 and not set TIF_SIGPENDING on, task works never executed until sth else happen. So the kvm enter guest repeatedly and the #MC is triggered repeatedly.


Sorry for my incorrect description.

I figure out that my test kernel is not the lastest, it's without [2] commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function"), so vcpu_run() only care about signal_pending but not TIF_NOTIFY_RESUME which set on in task_work_add().

After [2], #MC flood should not exist.

Also thank Thomas Gleixner.

Can you consider to fix cases like this?

And do you mind to give me some advice for my temporary workaround about this #MC flood:
I want to check the context of do_machine_check() is exception or kvm, and fallback to call kill_me_xxx directly when in kvm context. (I already tested simply and met my expection)


So ignore my ask, please.

[1]: commit 5567d11c21a1 ("x86/mce: Send #MC singal from task work")


--
Thanks,
- Ding Hui