Re: [PATCH net-next v1 1/1] net: openvswitch: ovs_packet_cmd_execute put sw_flow mainbody in stack

From: Simon Horman
Date: Sun Feb 19 2023 - 08:56:28 EST


On Sat, Feb 18, 2023 at 02:53:29PM +0800, Eddy Tao wrote:
> Add 2 performance revisions for ovs_packet_cmd_execute

I think that in general it's nicer to do one change per patch:
i.e. split this into two patches.

> 1.Stores mainbody of sw_flow(600+ bytes) in stack
> Benifit: avoid kmem cache alloc/free caused by ovs_flow_alloc/free

Perhaps I am wrong, but 600 bytes seems like a lot of stack memory to consume.
And thus probably needs a strong justification.
Do you have some performance numbers showing a benefit of this change?

> 2.Define sw_flow_without_stats_init to initialize mainbody of
> struct sw_flow, which does not provides memory for sw_flow_stats.
> Reason: ovs_execute_actions does not touch sw_flow_stats.

Are there other code-paths that would also benefit from this change.

> Benefit: less memzero, say each 'sw_flow_stats *' takes 4/8
> bytes, on systems with 20 to 128 logic cpus, this is a good deal.

Less is more :)
Do you have some performance numbers showing a benefit of this change?

> Signed-off-by: Eddy Tao <taoyuan_eddy@xxxxxxxxxxx>
> ---
> net/openvswitch/datapath.c | 22 ++++++++++++----------
> 1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index fcee6012293b..337947d34355 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -589,6 +589,12 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
> return err;
> }
>
> +static void sw_flow_without_stats_init(struct sw_flow *flow)
> +{
> + memset(flow, 0, sizeof(*flow));
> + flow->stats_last_writer = -1;
> +}
> +
> static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
> {
> struct ovs_header *ovs_header = info->userhdr;
> @@ -596,7 +602,8 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
> struct nlattr **a = info->attrs;
> struct sw_flow_actions *acts;
> struct sk_buff *packet;
> - struct sw_flow *flow;
> + struct sw_flow f;
> + struct sw_flow *flow = &f;

I'm not sure it's really useful to have both f and flow.
Could we just have the following?

struct sw_flow *flow;

Also, it would be nice to move towards rather than away from
reverse xmas tree - longest line to shortest line - arrangement of local
variables in OVS code.

> struct sw_flow_actions *sf_acts;
> struct datapath *dp;
> struct vport *input_vport;
> @@ -636,20 +643,18 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
> }
>
> /* Build an sw_flow for sending this packet. */
> - flow = ovs_flow_alloc();
> - err = PTR_ERR(flow);
> - if (IS_ERR(flow))
> - goto err_kfree_skb;
> + /* This flow has no sw_flow_stats */
> + sw_flow_without_stats_init(flow);
>
> err = ovs_flow_key_extract_userspace(net, a[OVS_PACKET_ATTR_KEY],
> packet, &flow->key, log);
> if (err)
> - goto err_flow_free;
> + goto err_kfree_skb;
>
> err = ovs_nla_copy_actions(net, a[OVS_PACKET_ATTR_ACTIONS],
> &flow->key, &acts, log);
> if (err)
> - goto err_flow_free;
> + goto err_kfree_skb;
>
> rcu_assign_pointer(flow->sf_acts, acts);
> packet->priority = flow->key.phy.priority;
> @@ -677,13 +682,10 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
> local_bh_enable();
> rcu_read_unlock();
>
> - ovs_flow_free(flow, false);
> return err;
>
> err_unlock:
> rcu_read_unlock();
> -err_flow_free:
> - ovs_flow_free(flow, false);
> err_kfree_skb:
> kfree_skb(packet);
> err:
> --
> 2.27.0
>