Re: [PATCH net-next RFC v3 01/14] devlink: Add reload action option to devlink reload command

From: Jakub Kicinski
Date: Fri Sep 04 2020 - 15:56:52 EST


On Fri, 4 Sep 2020 11:04:50 +0200 Jiri Pirko wrote:
> Thu, Sep 03, 2020 at 09:47:19PM CEST, kuba@xxxxxxxxxx wrote:
> >On Thu, 3 Sep 2020 07:57:29 +0200 Jiri Pirko wrote:
> >> Wed, Sep 02, 2020 at 05:30:25PM CEST, kuba@xxxxxxxxxx wrote:
> >> >On Wed, 2 Sep 2020 11:46:27 +0200 Jiri Pirko wrote:
> >> >> >? Do we need such change there too or keep it as is, each action by itself
> >> >> >and return what was performed ?
> >> >>
> >> >> Well, I don't know. User asks for X, X should be performed, not Y or Z.
> >> >> So perhaps the return value is not needed.
> >> >> Just driver advertizes it supports X, Y, Z and the users says:
> >> >> 1) do X, driver does X
> >> >> 2) do Y, driver does Y
> >> >> 3) do Z, driver does Z
> >> >> [
> >> >> I think this kindof circles back to the original proposal...
> >> >
> >> >Why? User does not care if you activate new devlink params when
> >> >activating new firmware. Trust me. So why make the user figure out
> >> >which of all possible reset option they should select? If there is
> >> >a legitimate use case to limit what is reset - it should be handled
> >> >by a separate negative attribute, like --live which says don't reset
> >> >anything.
> >>
> >> I see. Okay. Could you please sum-up the interface as you propose it?
> >
> >What I proposed on v1, pass requested actions as a bitfield, driver may
> >perform more actions, we can return performed actions in the response.
>
> Okay. So for example for mlxsw, user might say:
> 1) I want driver reinit
> kernel reports: fw reset and driver reinit was done
> 2) I want fw reset
> kernel reports: fw reset and driver reinit was done
> 3) I want fw reset and driver reinit
> kernel reports: fw reset and driver reinit was done

Yup.

> >Then separate attribute to carry constraints for the request, like
> >--live.
>
> Hmm, this is a bit unclear how it is supposed to work. The constraints
> apply for all? I mean, the actions are requested by a bitfield.
> So the user can say:
> I want fw reset and driver reinit --live. "--live" applies to both fw
> reset and driver reinit? That is odd.

The way I was thinking about it - the constraint expresses what sort of
downtime the user can accept. So yes, it'd apply to all. If any of the
reset actions does not meet the constraint then error should be
returned.

In that sense I don't like --live because it doesn't really say much.
AFAIU it means 1) no link flap; 2) < 2 sec datapath downtime; 3) no
configuration is lost in kernel or device (including netdev config,
link config, flow rules, counters etc.). I was hoping at least the
documentation in patch 14 would be more precise.

I think you're saying that it's strange to express that as a constraint
because internally it maps to one of two fw reset types. And there is
only one driver reinit procedure. But I don't think that the
distinction of reset types is valuable to the user. What matters is if
application SLAs will be met or not.

I assume that deeper/longer reset is always less risky and would be
preferred unless constraint is specified.

> >I'd think the supported actions in devlink_ops would be fine as a
> >bitfield, too. Combinations are often hard to capture in static data.