Re: [PATCH net-next] devlink: Execute devlink health recover as a work

From: Moshe Shemesh
Date: Fri Apr 26 2019 - 09:04:38 EST




On 4/26/2019 5:37 AM, Jakub Kicinski wrote:
> On Fri, 26 Apr 2019 01:42:34 +0000, Saeed Mahameed wrote:
>>>> @@ -4813,7 +4831,11 @@ static int
>>>> devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
>>>> if (!reporter)
>>>> return -EINVAL;
>>>>
>>>> - return devlink_health_reporter_recover(reporter, NULL);
>>>> + if (!reporter->ops->recover)
>>>> + return -EOPNOTSUPP;
>>>> +
>>>> + queue_work(devlink->reporters_wq, &reporter->recover_work);
>>>> + return 0;
>>>> }
>>>
>>> So the recover user space request will no longer return the status,
>>> and
>>> it will not actually wait for the recover to happen. Leaving user
>>> pondering - did the recover run and fail, or did it nor get run
>>> yet...
>>>
>>
>> wait_for_completion_interruptible_timeout is missing from the design ?
>
> Perhaps, but I think its better to avoid the async execution of
> the recover all together. Perhaps its better to refcount the
> reporters on the call to recover_doit? Or some such.. :)
>

I tried using refcount instead of devlink lock here. But once I get to
reporter destroy I wait for the refcount and not sure if I should
release the reporter after some timeout or have endless wait for
refcount. Both options seem not good.