Re: [PATCH bpf-next v5] virtio_net: add XDP meta data support

From: Michael S. Tsirkin
Date: Sun Feb 23 2020 - 03:15:06 EST


On Fri, Feb 21, 2020 at 05:36:08PM +0900, Yuya Kusakabe wrote:
> On 2/21/20 1:23 PM, Jason Wang wrote:
> >
> > On 2020/2/20 äå4:55, Yuya Kusakabe wrote:
> >> Implement support for transferring XDP meta data into skb for
> >> virtio_net driver; before calling into the program, xdp.data_meta points
> >> to xdp.data, where on program return with pass verdict, we call
> >> into skb_metadata_set().
> >>
> >> Tested with the script at
> >> https://github.com/higebu/virtio_net-xdp-metadata-test.
> >>
> >> Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access")
> >
> >
> > I'm not sure this is correct since virtio-net claims to not support metadata by calling xdp_set_data_meta_invalid()?
>
> virtio_net doesn't support by calling xdp_set_data_meta_invalid() for now.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/drivers/net/virtio_net.c?id=e42da4c62abb547d9c9138e0e7fcd1f36057b5e8#n686
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/drivers/net/virtio_net.c?id=e42da4c62abb547d9c9138e0e7fcd1f36057b5e8#n842
>
> And xdp_set_data_meta_invalid() are added by de8f3a83b0a0.
>
> $ git blame ./drivers/net/virtio_net.c | grep xdp_set_data_meta_invalid
> de8f3a83b0a0f (Daniel Borkmann 2017-09-25 02:25:51 +0200 686) xdp_set_data_meta_invalid(&xdp);
> de8f3a83b0a0f (Daniel Borkmann 2017-09-25 02:25:51 +0200 842) xdp_set_data_meta_invalid(&xdp);
>
> So I added `Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access")` to the comment.
>
> >
> >
> >> Signed-off-by: Yuya Kusakabe <yuya.kusakabe@xxxxxxxxx>
> >> ---
> >> v5:
> >> Â - page_to_skb(): copy vnet header if hdr_valid without checking metasize.
> >> Â - receive_small(): do not copy vnet header if xdp_prog is availavle.
> >> Â - __virtnet_xdp_xmit_one(): remove the xdp_set_data_meta_invalid().
> >> Â - improve comments.
> >> v4:
> >> Â - improve commit message
> >> v3:
> >> Â - fix preserve the vnet header in receive_small().
> >> v2:
> >> Â - keep copy untouched in page_to_skb().
> >> Â - preserve the vnet header in receive_small().
> >> Â - fix indentation.
> >> ---
> >> Â drivers/net/virtio_net.c | 54 ++++++++++++++++++++++++----------------
> >> Â 1 file changed, 33 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 2fe7a3188282..4ea0ae60c000 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -371,7 +371,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct receive_queue *rq,
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct page *page, unsigned int offset,
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int len, unsigned int truesize,
> >> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ bool hdr_valid)
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ bool hdr_valid, unsigned int metasize)
> >> Â {
> >> ÂÂÂÂÂ struct sk_buff *skb;
> >> ÂÂÂÂÂ struct virtio_net_hdr_mrg_rxbuf *hdr;
> >> @@ -393,6 +393,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >> ÂÂÂÂÂ else
> >> ÂÂÂÂÂÂÂÂÂ hdr_padded_len = sizeof(struct padded_vnet_hdr);
> >> Â +ÂÂÂ /* hdr_valid means no XDP, so we can copy the vnet header */
> >> ÂÂÂÂÂ if (hdr_valid)
> >> ÂÂÂÂÂÂÂÂÂ memcpy(hdr, p, hdr_len);
> >> Â @@ -405,6 +406,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >> ÂÂÂÂÂÂÂÂÂ copy = skb_tailroom(skb);
> >> ÂÂÂÂÂ skb_put_data(skb, p, copy);
> >> Â +ÂÂÂ if (metasize) {
> >> +ÂÂÂÂÂÂÂ __skb_pull(skb, metasize);
> >> +ÂÂÂÂÂÂÂ skb_metadata_set(skb, metasize);
> >> +ÂÂÂ }
> >> +
> >> ÂÂÂÂÂ len -= copy;
> >> ÂÂÂÂÂ offset += copy;
> >> Â @@ -450,10 +456,6 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> >> ÂÂÂÂÂ struct virtio_net_hdr_mrg_rxbuf *hdr;
> >> ÂÂÂÂÂ int err;
> >> Â -ÂÂÂ /* virtqueue want to use data area in-front of packet */
> >> -ÂÂÂ if (unlikely(xdpf->metasize > 0))
> >> -ÂÂÂÂÂÂÂ return -EOPNOTSUPP;
> >> -
> >> ÂÂÂÂÂ if (unlikely(xdpf->headroom < vi->hdr_len))
> >> ÂÂÂÂÂÂÂÂÂ return -EOVERFLOW;
> >> Â @@ -644,6 +646,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
> >> ÂÂÂÂÂ unsigned int delta = 0;
> >> ÂÂÂÂÂ struct page *xdp_page;
> >> ÂÂÂÂÂ int err;
> >> +ÂÂÂ unsigned int metasize = 0;
> >> Â ÂÂÂÂÂ len -= vi->hdr_len;
> >> ÂÂÂÂÂ stats->bytes += len;
> >> @@ -683,8 +686,8 @@ static struct sk_buff *receive_small(struct net_device *dev,
> >> Â ÂÂÂÂÂÂÂÂÂ xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
> >> ÂÂÂÂÂÂÂÂÂ xdp.data = xdp.data_hard_start + xdp_headroom;
> >> -ÂÂÂÂÂÂÂ xdp_set_data_meta_invalid(&xdp);
> >> ÂÂÂÂÂÂÂÂÂ xdp.data_end = xdp.data + len;
> >> +ÂÂÂÂÂÂÂ xdp.data_meta = xdp.data;
> >> ÂÂÂÂÂÂÂÂÂ xdp.rxq = &rq->xdp_rxq;
> >> ÂÂÂÂÂÂÂÂÂ orig_data = xdp.data;
> >> ÂÂÂÂÂÂÂÂÂ act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >> @@ -695,6 +698,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ /* Recalculate length in case bpf program changed it */
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ delta = orig_data - xdp.data;
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ len = xdp.data_end - xdp.data;
> >> +ÂÂÂÂÂÂÂÂÂÂÂ metasize = xdp.data - xdp.data_meta;
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
> >> ÂÂÂÂÂÂÂÂÂ case XDP_TX:
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ stats->xdp_tx++;
> >> @@ -735,11 +739,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
> >> ÂÂÂÂÂ }
> >> ÂÂÂÂÂ skb_reserve(skb, headroom - delta);
> >> ÂÂÂÂÂ skb_put(skb, len);
> >> -ÂÂÂ if (!delta) {
> >> +ÂÂÂ if (!xdp_prog) {
> >> ÂÂÂÂÂÂÂÂÂ buf += header_offset;
> >> ÂÂÂÂÂÂÂÂÂ memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
> >> ÂÂÂÂÂ } /* keep zeroed vnet hdr since packet was changed by bpf */
> >
> >
> > I prefer to make this an independent patch and cc stable.
> >
> > Other looks good.
> >
> > Thanks
>
> I see. So I need to revert to delta from xdp_prog?
>
> Thank you.

So maybe send a 2 patch series: 1/2 is this chunk with the appropriate
description. Actually for netdev David prefers that people do not
cc stable directly, just include Fixes tag and mention in the
commit log it's also needed for stable. Patch 2/2 is the rest
handling metadata.

> >
> >> Â +ÂÂÂ if (metasize)
> >> +ÂÂÂÂÂÂÂ skb_metadata_set(skb, metasize);
> >> +
> >> Â err:
> >> ÂÂÂÂÂ return skb;
> >> Â @@ -760,8 +767,8 @@ static struct sk_buff *receive_big(struct net_device *dev,
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct virtnet_rq_stats *stats)
> >> Â {
> >> ÂÂÂÂÂ struct page *page = buf;
> >> -ÂÂÂ struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len,
> >> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ PAGE_SIZE, true);
> >> +ÂÂÂ struct sk_buff *skb =
> >> +ÂÂÂÂÂÂÂ page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, true, 0);
> >> Â ÂÂÂÂÂ stats->bytes += len - vi->hdr_len;
> >> ÂÂÂÂÂ if (unlikely(!skb))
> >> @@ -793,6 +800,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >> ÂÂÂÂÂ unsigned int truesize;
> >> ÂÂÂÂÂ unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> >> ÂÂÂÂÂ int err;
> >> +ÂÂÂ unsigned int metasize = 0;
> >> Â ÂÂÂÂÂ head_skb = NULL;
> >> ÂÂÂÂÂ stats->bytes += len - vi->hdr_len;
> >> @@ -839,8 +847,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >> ÂÂÂÂÂÂÂÂÂ data = page_address(xdp_page) + offset;
> >> ÂÂÂÂÂÂÂÂÂ xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
> >> ÂÂÂÂÂÂÂÂÂ xdp.data = data + vi->hdr_len;
> >> -ÂÂÂÂÂÂÂ xdp_set_data_meta_invalid(&xdp);
> >> ÂÂÂÂÂÂÂÂÂ xdp.data_end = xdp.data + (len - vi->hdr_len);
> >> +ÂÂÂÂÂÂÂ xdp.data_meta = xdp.data;
> >> ÂÂÂÂÂÂÂÂÂ xdp.rxq = &rq->xdp_rxq;
> >> Â ÂÂÂÂÂÂÂÂÂ act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >> @@ -848,24 +856,27 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >> Â ÂÂÂÂÂÂÂÂÂ switch (act) {
> >> ÂÂÂÂÂÂÂÂÂ case XDP_PASS:
> >> +ÂÂÂÂÂÂÂÂÂÂÂ metasize = xdp.data - xdp.data_meta;
> >> +
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ /* recalculate offset to account for any header
> >> -ÂÂÂÂÂÂÂÂÂÂÂÂ * adjustments. Note other cases do not build an
> >> -ÂÂÂÂÂÂÂÂÂÂÂÂ * skb and avoid using offset
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂ * adjustments and minus the metasize to copy the
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂ * metadata in page_to_skb(). Note other cases do not
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂ * build an skb and avoid using offset
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
> >> -ÂÂÂÂÂÂÂÂÂÂÂ offset = xdp.data -
> >> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ page_address(xdp_page) - vi->hdr_len;
> >> +ÂÂÂÂÂÂÂÂÂÂÂ offset = xdp.data - page_address(xdp_page) -
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vi->hdr_len - metasize;
> >> Â -ÂÂÂÂÂÂÂÂÂÂÂ /* recalculate len if xdp.data or xdp.data_end were
> >> -ÂÂÂÂÂÂÂÂÂÂÂÂ * adjusted
> >> +ÂÂÂÂÂÂÂÂÂÂÂ /* recalculate len if xdp.data, xdp.data_end or
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂ * xdp.data_meta were adjusted
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
> >> -ÂÂÂÂÂÂÂÂÂÂÂ len = xdp.data_end - xdp.data + vi->hdr_len;
> >> +ÂÂÂÂÂÂÂÂÂÂÂ len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ /* We can only create skb based on xdp_page. */
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ if (unlikely(xdp_page != page)) {
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rcu_read_unlock();
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ put_page(page);
> >> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ head_skb = page_to_skb(vi, rq, xdp_page,
> >> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ offset, len,
> >> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ PAGE_SIZE, false);
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ head_skb = page_to_skb(vi, rq, xdp_page, offset,
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ len, PAGE_SIZE, false,
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ metasize);
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return head_skb;
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ }
> >> ÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
> >> @@ -921,7 +932,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >> ÂÂÂÂÂÂÂÂÂ goto err_skb;
> >> ÂÂÂÂÂ }
> >> Â -ÂÂÂ head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog);
> >> +ÂÂÂ head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog,
> >> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ metasize);
> >> ÂÂÂÂÂ curr_skb = head_skb;
> >> Â ÂÂÂÂÂ if (unlikely(!curr_skb))
> >