Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination

From: Karsten Graul
Date: Wed Dec 29 2021 - 07:56:54 EST


On 28/12/2021 16:13, Wen Gu wrote:
> We encountered some crashes caused by the race between the access
> and the termination of link groups.

While I agree with the problems you found I am not sure if the solution is the right one.
At the moment conn->lgr is checked all over the code as indication if a connection
still has a valid link group. When you change this semantic by leaving conn->lgr set
after the connection was unregistered from its link group then I expect various new problems
to happen.

For me the right solution would be to use correct locking before conn->lgr is checked and used.

In smc_lgr_unregister_conn() the lgr->conns_lock is used when conn->lgr is unset (note that
it is better to have that "conn->lgr = NULL;" line INSIDE the lock in this function).

And on any places in the code where conn->lgr is used you get the read_lock while lgr is accessed.
This could solve the problem, using existing mechanisms, right? Opinions?