Re: [PATCH net-next 05/16] devlink: Add remote reload stats

From: Jacob Keller
Date: Mon Oct 05 2020 - 15:19:58 EST




On 10/4/2020 12:09 AM, Moshe Shemesh wrote:
>
> On 10/3/2020 12:05 PM, Jiri Pirko wrote:
>> Thu, Oct 01, 2020 at 03:59:08PM CEST, moshe@xxxxxxxxxxxx wrote:
>>> Add remote reload stats to hold the history of actions performed due
>>> devlink reload commands initiated by remote host. For example, in case
>>> firmware activation with reset finished successfully but was initiated
>>> by remote host.
>>>
>>> The function devlink_remote_reload_actions_performed() is exported to
>>> enable drivers update on remote reload actions performed as it was not
>>> initiated by their own devlink instance.
>>>
>>> Expose devlink remote reload stats to the user through devlink dev get
>>> command.
>>>
>>> Examples:
>>> $ devlink dev show
>>> pci/0000:82:00.0:
>>> stats:
>>> reload_stats:
>>> driver_reinit 2
>>> fw_activate 1
>>> fw_activate_no_reset 0
>>> remote_reload_stats:
>>> driver_reinit 0
>>> fw_activate 0
>>> fw_activate_no_reset 0
>>> pci/0000:82:00.1:
>>> stats:
>>> reload_stats:
>>> driver_reinit 1
>>> fw_activate 0
>>> fw_activate_no_reset 0
>>> remote_reload_stats:
>>> driver_reinit 1
>>> fw_activate 1
>>> fw_activate_no_reset 0
>>>
>>> $ devlink dev show -jp
>>> {
>>> "dev": {
>>> "pci/0000:82:00.0": {
>>> "stats": {
>>> "reload_stats": [ {
>>> "driver_reinit": 2
>>> },{
>>> "fw_activate": 1
>>> },{
>>> "fw_activate_no_reset": 0
>>> } ],
>>> "remote_reload_stats": [ {
>>> "driver_reinit": 0
>>> },{
>>> "fw_activate": 0
>>> },{
>>> "fw_activate_no_reset": 0
>>> } ]
>>> }
>>> },
>>> "pci/0000:82:00.1": {
>>> "stats": {
>>> "reload_stats": [ {
>>> "driver_reinit": 1
>>> },{
>>> "fw_activate": 0
>>> },{
>>> "fw_activate_no_reset": 0
>>> } ],
>>> "remote_reload_stats": [ {
>>> "driver_reinit": 1
>>> },{
>>> "fw_activate": 1
>>> },{
>>> "fw_activate_no_reset": 0
>>> } ]
>>> }
>>> }
>>> }
>>> }
>>>
>>> Signed-off-by: Moshe Shemesh <moshe@xxxxxxxxxxxx>
>>> ---
>>> RFCv5 -> v1:
>>> - Resplit this patch and the previous one by remote/local reload stats
>>> instead of set/get reload stats
>>> - Rename reload_action_stats to reload_stats
>>> RFCv4 -> RFCv5:
>>> - Add remote actions stats
>>> - If devlink reload is not supported, show only remote_stats
>>> RFCv3 -> RFCv4:
>>> - Renamed DEVLINK_ATTR_RELOAD_ACTION_CNT to
>>> DEVLINK_ATTR_RELOAD_ACTION_STAT
>>> - Add stats per action per limit level
>>> RFCv2 -> RFCv3:
>>> - Add reload actions counters instead of supported reload actions
>>> (reload actions counters are only for supported action so no need for
>>> both)
>>> RFCv1 -> RFCv2:
>>> - Removed DEVLINK_ATTR_RELOAD_DEFAULT_LEVEL
>>> - Removed DEVLINK_ATTR_RELOAD_LEVELS_INFO
>>> - Have actions instead of levels
>>> ---
>>> include/net/devlink.h | 1 +
>>> include/uapi/linux/devlink.h | 1 +
>>> net/core/devlink.c | 49 +++++++++++++++++++++++++++++++-----
>>> 3 files changed, 45 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>>> index 0f3bd23b6c04..a4ccb83bbd2c 100644
>>> --- a/include/net/devlink.h
>>> +++ b/include/net/devlink.h
>>> @@ -42,6 +42,7 @@ struct devlink {
>>> const struct devlink_ops *ops;
>>> struct xarray snapshot_ids;
>>> u32 reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
>>> + u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
>> Perhaps a nested struct {} stats?
> I guess you mean struct that holds these two arrays.
>>
>>> struct device *dev;
>>> possible_net_t _net;
>>> struct mutex lock; /* Serializes access to devlink instance specific objects such as
>>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>>> index 97e0137f6201..f9887d8afdc7 100644
>>> --- a/include/uapi/linux/devlink.h
>>> +++ b/include/uapi/linux/devlink.h
>>> @@ -530,6 +530,7 @@ enum devlink_attr {
>>> DEVLINK_ATTR_RELOAD_STATS, /* nested */
>>> DEVLINK_ATTR_RELOAD_STATS_ENTRY, /* nested */
>>> DEVLINK_ATTR_RELOAD_STATS_VALUE, /* u32 */
>>> + DEVLINK_ATTR_REMOTE_RELOAD_STATS, /* nested */
>>>
>>> /* add new attributes above here, update the policy in devlink.c */
>>>
>>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>>> index 05516f1e4c3e..3b6bd3b4d346 100644
>>> --- a/net/core/devlink.c
>>> +++ b/net/core/devlink.c
>>> @@ -523,28 +523,35 @@ static int devlink_reload_stat_put(struct sk_buff *msg, enum devlink_reload_acti
>>> return -EMSGSIZE;
>>> }
>>>
>>> -static int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink)
>>> +static int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, bool is_remote)
>>> {
>>> struct nlattr *reload_stats_attr;
>>> int i, j, stat_idx;
>>> u32 value;
>>>
>>> - reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS);
>>> + if (!is_remote)
>>> + reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS);
>>> + else
>>> + reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_REMOTE_RELOAD_STATS);
>>>
>>> if (!reload_stats_attr)
>>> return -EMSGSIZE;
>>>
>>> for (j = 0; j <= DEVLINK_RELOAD_LIMIT_MAX; j++) {
>>> - if (j != DEVLINK_RELOAD_LIMIT_UNSPEC &&
>>> + if (!is_remote && j != DEVLINK_RELOAD_LIMIT_UNSPEC &&
>> I don't follow the check "!is_remote" here,
>
>
> We agreed that remote stats should be shown also for non supported
> actions and limits, because its remote. So it makes this condition
> different for remote stats. Rethinking about it, maybe that's wrong. I
> mean if we had here reload actions as a result of remote driver, they
> have common device, so it has to be the same type of driver and support
> same actions/limits, right ?
>

Obviously it runs the same device but.. technically, couldn't the remote
device be running a different version of the driver? i.e. what if it
supports some new mode that this host doesn't yet understand? (or does
understand but has a driver which doesn't yet?)