Re: [PATCH net] net/mlx5: Avoid deadlock between PCI error recovery and health reporter

From: Shay Drori
Date: Mon Aug 11 2025 - 08:06:13 EST

Next message: Carlos Maiolino: "Re: [PATCH v3 0/3] xfs and DAX atomic writes changes"
Previous message: Will Deacon: "Re: [PATCH] arm64/module: Support for patching modules during runtime"
In reply to: Gerd Bayer: "Re: [PATCH net] net/mlx5: Avoid deadlock between PCI error recovery and health reporter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 11/08/2025 10:29, Gerd Bayer wrote:

External email: Use caution opening links or attachments

On Sun, 2025-08-10 at 14:51 +0300, Shay Drori wrote:

On 07/08/2025 16:11, Gerd Bayer wrote:

External email: Use caution opening links or attachments

During error recovery testing a pair of tasks was reported to be hung
due to a dead-lock situation:

- mlx5_unload_one() trying to acquire devlink lock while the PCI error
recovery code had acquired the pci_cfg_access_lock().

could you please add traces here?
I looked at the code and didn't see where pci_cfg_access_lock() is
taken...

Sure thing. This is the original hung task message:

10144.859042] mlx5_core 0000:00:00.1: mlx5_health_try_recover:338:(pid 5553): health recovery flow aborted, PCI reads still not working
[10320.359160] INFO: task kmcheck:72 blocked for more than 122 seconds.
[10320.359169] Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
[10320.359171] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10320.359172] task:kmcheck state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00000000
[10320.359176] Call Trace:
[10320.359178] [<000000065256f030>] __schedule+0x2a0/0x590
[10320.359187] [<000000065256f356>] schedule+0x36/0xe0
[10320.359189] [<000000065256f572>] schedule_preempt_disabled+0x22/0x30
[10320.359192] [<0000000652570a94>] __mutex_lock.constprop.0+0x484/0x8a8
[10320.359194] [<000003ff800673a4>] mlx5_unload_one+0x34/0x58 [mlx5_core]
[10320.359360] [<000003ff8006745c>] mlx5_pci_err_detected+0x94/0x140 [mlx5_core]
[10320.359400] [<0000000652556c5a>] zpci_event_attempt_error_recovery+0xf2/0x398
[10320.359406] [<0000000651b9184a>] __zpci_event_error+0x23a/0x2c0
[10320.359411] [<00000006522b3958>] chsc_process_event_information.constprop.0+0x1c8/0x1e8
[10320.359416] [<00000006522baf1a>] crw_collect_info+0x272/0x338
[10320.359418] [<0000000651bc9de0>] kthread+0x108/0x110
[10320.359422] [<0000000651b42ea4>] __ret_from_fork+0x3c/0x58
[10320.359425] [<0000000652576642>] ret_from_fork+0xa/0x30
[10320.359440] INFO: task kworker/u1664:6:1514 blocked for more than 122 seconds.
[10320.359441] Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
[10320.359442] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10320.359443] task:kworker/u1664:6 state:D stack:0 pid:1514 tgid:1514 ppid:2 flags:0x00000000
[10320.359447] Workqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
[10320.359492] Call Trace:
[10320.359521] [<000000065256f030>] __schedule+0x2a0/0x590
[10320.359524] [<000000065256f356>] schedule+0x36/0xe0
[10320.359526] [<0000000652172e28>] pci_wait_cfg+0x80/0xe8
[10320.359532] [<0000000652172f94>] pci_cfg_access_lock+0x74/0x88
[10320.359534] [<000003ff800916b6>] mlx5_vsc_gw_lock+0x36/0x178 [mlx5_core]
[10320.359585] [<000003ff80098824>] mlx5_crdump_collect+0x34/0x1c8 [mlx5_core]
[10320.359637] [<000003ff80074b62>] mlx5_fw_fatal_reporter_dump+0x6a/0xe8 [mlx5_core]
[10320.359680] [<0000000652512242>] devlink_health_do_dump.part.0+0x82/0x168
[10320.359683] [<0000000652513212>] devlink_health_report+0x19a/0x230
[10320.359685] [<000003ff80075a12>] mlx5_fw_fatal_reporter_err_work+0xba/0x1b0 [mlx5_core]
[10320.359728] [<0000000651bbf852>] process_one_work+0x1c2/0x458
[10320.359733] [<0000000651bc073e>] worker_thread+0x3ce/0x528
[10320.359735] [<0000000651bc9de0>] kthread+0x108/0x110
[10320.359737] [<0000000651b42ea4>] __ret_from_fork+0x3c/0x58
[10320.359739] [<0000000652576642>] ret_from_fork+0xa/0x30

The pci_config_access_lock() is acquired in zpci_event_attempt_error_recovery() by way of pci_dev_lock().

Thanks a lot!
can you please add the above to the commit message?

snip
<...>

---
drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
index 432c98f2626d..d2d3b57a57d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
@@ -73,16 +73,15 @@ int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev)
u32 lock_val;
int ret;

+ if (pci_channel_offline(dev->pdev))
+ return -EACCES;
+

There is still a race here.
it is possible that mlx5 have passed the above check, while
zpci_event_attempt_error_recovery() already took the cfg_look but still
didn't change the pdev to error state :(

pci_cfg_access_lock(dev->pdev);
do {
if (retries > VSC_MAX_RETRIES) {
ret = -EBUSY;
goto pci_unlock;
}
- if (pci_channel_offline(dev->pdev)) {
- ret = -EACCES;
- goto pci_unlock;
- }

/* Check if semaphore is already locked */
ret = vsc_read(dev, VSC_SEMAPHORE_OFFSET, &lock_val);
--
2.48.1

Next message: Carlos Maiolino: "Re: [PATCH v3 0/3] xfs and DAX atomic writes changes"
Previous message: Will Deacon: "Re: [PATCH] arm64/module: Support for patching modules during runtime"
In reply to: Gerd Bayer: "Re: [PATCH net] net/mlx5: Avoid deadlock between PCI error recovery and health reporter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]