Re: [PATCH net] net/smc: Transitional solution for clcsock race issue

From: Wen Gu
Date: Fri Jan 21 2022 - 02:05:08 EST



On 2022/1/13 11:02 pm, Wen Gu wrote:
We encountered a crash in smc_setsockopt() and it is caused by
accessing smc->clcsock after clcsock was released.

BUG: kernel NULL pointer dereference, address: 0000000000000020
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E 5.16.0-rc4+ #53
RIP: 0010:smc_setsockopt+0x59/0x280 [smc]
Call Trace:
<TASK>
__sys_setsockopt+0xfc/0x190
__x64_sys_setsockopt+0x20/0x30
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f16ba83918e
</TASK>

This patch tries to fix it by holding clcsock_release_lock and
checking whether clcsock has already been released before access.

In case that a crash of the same reason happens in smc_getsockopt()
or smc_switch_to_fallback(), this patch also checkes smc->clcsock
in them too. And the caller of smc_switch_to_fallback() will identify
whether fallback succeeds according to the return value.

Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx>
---
net/smc/af_smc.c | 63 +++++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 51 insertions(+), 12 deletions(-)


Sorry for bothering, just wonder if this patch needs further improvements?

The previous discussion can be found in:
https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@xxxxxxxxxxxxx/T/

I sent this patch with a new subject instead of sending a v2 of the previously
discussed patch because I think the original subject seems not appropriate anymore
after introducing check of clcsock in smc_switch_to_fallback().

Thanks,
Wen Gu