Re: [PATCH net v2] net/smc: Transitional solution for clcsock race issue

From: Wen Gu
Date: Fri Jan 21 2022 - 08:07:17 EST



On 2022/1/21 8:43 pm, Wen Gu wrote:
We encountered a crash in smc_setsockopt() and it is caused by
accessing smc->clcsock after clcsock was released.

BUG: kernel NULL pointer dereference, address: 0000000000000020
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E 5.16.0-rc4+ #53
RIP: 0010:smc_setsockopt+0x59/0x280 [smc]
Call Trace:
<TASK>
__sys_setsockopt+0xfc/0x190
__x64_sys_setsockopt+0x20/0x30
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f16ba83918e
</TASK>

This patch tries to fix it by holding clcsock_release_lock and
checking whether clcsock has already been released before access.

In case that a crash of the same reason happens in smc_getsockopt()
or smc_switch_to_fallback(), this patch also checkes smc->clcsock
in them too. And the caller of smc_switch_to_fallback() will identify
whether fallback succeeds according to the return value.

Fixes: fd57770dd198 ("net/smc: wait for pending work before clcsock release_sock")
Link: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@xxxxxxxxxxxxx/T/
Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx>
Acked-by: Karsten Graul <kgraul@xxxxxxxxxxxxx>
---

I seem to have missed this:

---
v2 -> v1:

Add 'Fixes:' tag and 'Link:' tag.
---


Looks like I need a script to check the details to avoid mistake...


Thanks,
Wen Gu