Re: [PATCH net-next] devlink: Execute devlink health recover as a work

From: Jakub Kicinski
Date: Fri Apr 26 2019 - 12:19:54 EST


On Fri, Apr 26, 2019 at 6:04 AM Moshe Shemesh <moshe@xxxxxxxxxxxx> wrote:
> On 4/26/2019 5:37 AM, Jakub Kicinski wrote:
> > On Fri, 26 Apr 2019 01:42:34 +0000, Saeed Mahameed wrote:
> >>>> @@ -4813,7 +4831,11 @@ static int
> >>>> devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
> >>>> if (!reporter)
> >>>> return -EINVAL;
> >>>>
> >>>> - return devlink_health_reporter_recover(reporter, NULL);
> >>>> + if (!reporter->ops->recover)
> >>>> + return -EOPNOTSUPP;
> >>>> +
> >>>> + queue_work(devlink->reporters_wq, &reporter->recover_work);
> >>>> + return 0;
> >>>> }
> >>>
> >>> So the recover user space request will no longer return the status,
> >>> and
> >>> it will not actually wait for the recover to happen. Leaving user
> >>> pondering - did the recover run and fail, or did it nor get run
> >>> yet...
> >>>
> >>
> >> wait_for_completion_interruptible_timeout is missing from the design ?
> >
> > Perhaps, but I think its better to avoid the async execution of
> > the recover all together. Perhaps its better to refcount the
> > reporters on the call to recover_doit? Or some such.. :)
> >
>
> I tried using refcount instead of devlink lock here. But once I get to
> reporter destroy I wait for the refcount and not sure if I should
> release the reporter after some timeout or have endless wait for
> refcount. Both options seem not good.

Well you should "endlessly" wait. Why would the refcount not drop,
you have to remove it from the list first, so no new operations can
start, right?
In principle there is no difference between waiting for refcount to
drop, flushing the work, or waiting for the devlink lock if reporter
holds it?