Re: [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear

From: Karsten Graul
Date: Wed Dec 29 2021 - 07:51:39 EST


On 28/12/2021 16:13, Wen Gu wrote:
> We encountered some crashes caused by the race between SMC-R
> link access and link clear triggered by link group termination
> in abnormal case, like port error.

Without to dig deeper into this, there is already a refcount for links, see smc_wr_tx_link_hold().
In smc_wr_free_link() there are waits for the refcounts to become zero.

Why do you need to introduce another refcounting instead of using the existing?
And if you have a good reason, do we still need the existing refcounting with your new
implementation?

Maybe its enough to use the existing refcounting in the other functions like smc_llc_flow_initiate()?

Btw: it is interesting what kind of crashes you see, we never met them in our setup.
Its great to see you evaluating SMC in a cloud environment!