Re: [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers

From: Stefano Garzarella
Date: Tue Jul 01 2025 - 06:45:39 EST


On Mon, Jun 30, 2025 at 03:20:57PM +0100, Will Deacon wrote:
On Fri, Jun 27, 2025 at 12:45:45PM +0200, Stefano Garzarella wrote:
On Wed, Jun 25, 2025 at 02:15:41PM +0100, Will Deacon wrote:
> When receiving a packet from a guest, vhost_vsock_handle_tx_kick()
> calls vhost_vsock_alloc_skb() to allocate and fill an SKB with the
> receive data. Unfortunately, these are always linear allocations and can
> therefore result in significant pressure on kmalloc() considering that
> the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE +
> VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB
> allocation for each packet.
>
> Rework the vsock SKB allocation so that, for sizes with page order
> greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated
> instead with the packet header in the SKB and the receive data in the
> fragments.
>
> Signed-off-by: Will Deacon <will@xxxxxxxxxx>
> ---
> drivers/vhost/vsock.c | 15 +++++++++------
> include/linux/virtio_vsock.h | 31 +++++++++++++++++++++++++------
> 2 files changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 66a0f060770e..cfa4e1bcf367 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -344,11 +344,16 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
>
> len = iov_length(vq->iov, out);
>
> - if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
> + if (len < VIRTIO_VSOCK_SKB_HEADROOM ||

Why moving this check here?

I moved it here because virtio_vsock_alloc_skb_with_frags() does:

+ size -= VIRTIO_VSOCK_SKB_HEADROOM;
+ return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
+ size, mask);

and so having the check in __virtio_vsock_alloc_skb_with_frags() looks
strange as, by then, it really only applies to the linear case. It also
feels weird to me to have the upper-bound of the length checked by the
caller but the lower-bound checked in the callee. I certainly find it
easier to reason about if they're in the same place.

Additionally, the lower-bound check is only needed by the vhost receive
code, as the transmit path uses virtio_vsock_alloc_skb(), which never
passes a size smaller than VIRTIO_VSOCK_SKB_HEADROOM.

Given all that, moving it to the one place that needs it seemed like the
best option. What do you think?

Okay, I see now. Yep, it's fine, but please mention in the commit description.


> + len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
> return NULL;
>
> /* len contains both payload and hdr */
> - skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
> + if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> + skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
> + else
> + skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);

Can we do this directly in virtio_vsock_alloc_skb() so we don't need
to duplicate code on virtio/vhost code?

We can, but then I think we should do something different for the
rx_fill() path -- it feels fragile to rely on that using small-enough
buffers to guarantee linear allocations. How about I:

1. Add virtio_vsock_alloc_linear_skb(), which always performs a linear
allocation.

2. Change virtio_vsock_alloc_skb() to use nonlinear SKBs for sizes
greater than SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)

3. Use virtio_vsock_alloc_linear_skb() to fill the guest RX buffers

4. Use virtio_vsock_alloc_skb() for everything else

If you like the idea, I'll rework the series along those lines.
Diff below... (see end of mail)

I really like it :-) let's go in that direction!


> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 67ffb64325ef..8f9fa1cab32a 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -51,27 +51,46 @@ static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb)
> {
> u32 len;
>
> + DEBUG_NET_WARN_ON_ONCE(skb->len);

Should we mention in the commit message?

Sure, I'll add something. The non-linear handling doesn't accumulate len,
so it's a debug check to ensure that len hasn't been messed with between
allocation and here.

> len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
>
> - if (len > 0)

Why removing this check?

I think it's redundant: len is a u32, so we're basically just checking
to see if it's non-zero. All the callers have already checked for this
but, even if they didn't, skb_put(skb, 0) is harmless afaict.

Yep, I see, but now I don't remember why we have it, could it be more
expensive to call `skb_put(skb, 0)`, instead of just having the if for
control packets with no payload?

Thanks,
Stefano


Will

--->8

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 3799c0aeeec5..a6cd72a32f63 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -349,11 +349,7 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
return NULL;

/* len contains both payload and hdr */
- if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
- skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
- else
- skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
-
+ skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
if (!skb)
return NULL;

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 0e265921be03..ed5eab46e3dc 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -79,16 +79,19 @@ __virtio_vsock_alloc_skb_with_frags(unsigned int header_len,
}

static inline struct sk_buff *
-virtio_vsock_alloc_skb_with_frags(unsigned int size, gfp_t mask)
+virtio_vsock_alloc_linear_skb(unsigned int size, gfp_t mask)
{
- size -= VIRTIO_VSOCK_SKB_HEADROOM;
- return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
- size, mask);
+ return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
}

static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask)
{
- return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
+ if (size <= SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
+ return virtio_vsock_alloc_linear_skb(size, mask);
+
+ size -= VIRTIO_VSOCK_SKB_HEADROOM;
+ return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
+ size, mask);
}

static inline void
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 4ae714397ca3..8c9ca0cb0d4e 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -321,7 +321,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
vq = vsock->vqs[VSOCK_VQ_RX];

do {
- skb = virtio_vsock_alloc_skb(total_len, GFP_KERNEL);
+ skb = virtio_vsock_alloc_linear_skb(total_len, GFP_KERNEL);
if (!skb)
break;

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 424eb69e84f9..f74677c3511e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -262,11 +262,7 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
if (!zcopy)
skb_len += payload_len;

- if (skb_len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
- skb = virtio_vsock_alloc_skb_with_frags(skb_len, GFP_KERNEL);
- else
- skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
-
+ skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
if (!skb)
return NULL;