Re: [PATCH] iommu: Avoid softlockup and rcu stall in fq_flush_timeout().

From: Peng Zhang
Date: Thu Feb 16 2023 - 06:03:21 EST




在 2023/2/16 16:49, Hillf Danton 写道:
On Thu, 16 Feb 2023 15:11:48 +0800 Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx>
There is softlockup under fio pressure test with smmu enabled:
watchdog: BUG: soft lockup - CPU#81 stuck for 22s! [swapper/81:0]

What is your kernel version?
Rcu stall occurs in kernel version 5.4.
The test where the softlockup happened was not done by me, so I don't know the kernel version.
However, it is the same as the code logic of fq_flush_timeout in the mainline kernel.

This is because the timer callback fq_flush_timeout may run more than
10ms, and timer may be processed continuously in the softirq so trigger
softlockup and rcu stall. We can use work to deal with fq_ring_free for
each cpu which may take long time, that to avoid triggering softlockup
and rcu stall.

This patch is modified from the patch[1] of openEuler.

Because of a timer hog observed on your system with 128 CPUs for instance
does it make any sense to ask Peter to apply the patch for his 2-CPU box?
What is 2-CPU box?