Re: [PATCH net-next RFC v3 01/14] devlink: Add reload action option to devlink reload command

From: Moshe Shemesh
Date: Mon Sep 07 2020 - 13:22:21 EST



On 9/4/2020 10:56 PM, Jakub Kicinski wrote:
External email: Use caution opening links or attachments


On Fri, 4 Sep 2020 11:04:50 +0200 Jiri Pirko wrote:
Thu, Sep 03, 2020 at 09:47:19PM CEST, kuba@xxxxxxxxxx wrote:
On Thu, 3 Sep 2020 07:57:29 +0200 Jiri Pirko wrote:
Wed, Sep 02, 2020 at 05:30:25PM CEST, kuba@xxxxxxxxxx wrote:
On Wed, 2 Sep 2020 11:46:27 +0200 Jiri Pirko wrote:
? Do we need such change there too or keep it as is, each action by itself
and return what was performed ?
Well, I don't know. User asks for X, X should be performed, not Y or Z.
So perhaps the return value is not needed.
Just driver advertizes it supports X, Y, Z and the users says:
1) do X, driver does X
2) do Y, driver does Y
3) do Z, driver does Z
[
I think this kindof circles back to the original proposal...
Why? User does not care if you activate new devlink params when
activating new firmware. Trust me. So why make the user figure out
which of all possible reset option they should select? If there is
a legitimate use case to limit what is reset - it should be handled
by a separate negative attribute, like --live which says don't reset
anything.
I see. Okay. Could you please sum-up the interface as you propose it?
What I proposed on v1, pass requested actions as a bitfield, driver may
perform more actions, we can return performed actions in the response.
Okay. So for example for mlxsw, user might say:
1) I want driver reinit
kernel reports: fw reset and driver reinit was done
2) I want fw reset
kernel reports: fw reset and driver reinit was done
3) I want fw reset and driver reinit
kernel reports: fw reset and driver reinit was done
Yup.

Then separate attribute to carry constraints for the request, like
--live.
Hmm, this is a bit unclear how it is supposed to work. The constraints
apply for all? I mean, the actions are requested by a bitfield.
So the user can say:
I want fw reset and driver reinit --live. "--live" applies to both fw
reset and driver reinit? That is odd.
The way I was thinking about it - the constraint expresses what sort of
downtime the user can accept. So yes, it'd apply to all. If any of the
reset actions does not meet the constraint then error should be
returned.

In that sense I don't like --live because it doesn't really say much.
AFAIU it means 1) no link flap; 2) < 2 sec datapath downtime; 3) no
configuration is lost in kernel or device (including netdev config,
link config, flow rules, counters etc.). I was hoping at least the
documentation in patch 14 would be more precise.


Actually, while writing "no-reset" or "live-patching" I meant also no downtime at all and nothing resets (config, rules ... anything), that fits mlx5 live-patching.

However, to make it more generic,  I can allow few seconds downtime and add similar constrains as you mentioned here to "no-reset". I will add that to the documentation patch.

I think you're saying that it's strange to express that as a constraint
because internally it maps to one of two fw reset types. And there is
only one driver reinit procedure. But I don't think that the
distinction of reset types is valuable to the user. What matters is if
application SLAs will be met or not.

I assume that deeper/longer reset is always less risky and would be
preferred unless constraint is specified.

I'd think the supported actions in devlink_ops would be fine as a
bitfield, too. Combinations are often hard to capture in static data.