Re: [PATCH net-next v4 4/4] net: tun: track dropped skb via kfree_skb_reason()

From: Dongli Zhang
Date: Wed Mar 02 2022 - 13:20:43 EST


Hi Jakub and David,

On 3/1/22 6:50 PM, Jakub Kicinski wrote:
> On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote:
>> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */
>> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */
>
> IDK if these are not too low level and therefore lacking meaning.
>
> What are your thoughts David?
>
> Would it be better to up level the names a little bit and call SKB_PULL
> something like "HDR_TRUNC" or "HDR_INV" or "HDR_ERR" etc or maybe
> "L2_HDR_ERR" since in this case we seem to be pulling off ETH_HLEN?

This is for device driver and I think for most of cases the people understanding
source code will be involved. I think SKB_PULL is more meaningful to people
understanding source code.

The functions to pull data to skb is commonly used with the same pattern, and
not only for ETH_HLEN. E.g., I randomly found below in kernel source code.

1071 static rx_handler_result_t macsec_handle_frame(struct sk_buff **pskb)
1072 {
... ...
1102 pulled_sci = pskb_may_pull(skb, macsec_extra_len(true));
1103 if (!pulled_sci) {
1104 if (!pskb_may_pull(skb, macsec_extra_len(false)))
1105 goto drop_direct;
1106 }
... ...
1254 drop_direct:
1255 kfree_skb(skb);
1256 *pskb = NULL;
1257 return RX_HANDLER_CONSUMED;


About 'L2_HDR_ERR', I am curious what the user/administrator may do as next
step, while the 'SKB_PULL' will be very clear to the developers which kernel
operation (e.g., to pull some protocol/hdr data to sk_buff data) is with the issue.

I may use 'L2_HDR_ERR' if you prefer.

>
> For SKB_TRIM the error comes from allocation failures, there may be
> a whole bunch of skb helpers which will fail only under mem pressure,
> would it be better to identify them and return some ENOMEM related
> reason, since, most likely, those will be noise to whoever is tracking
> real errors?

The reasons I want to use SKB_TRIM:

1. To have SKB_PULL and SKB_TRIM (perhaps more SKB_XXX in the future in the same
set).

2. Although so that SKB_TRIM is always caused by ENOMEM, suppose if there is new
return values by pskb_trim(), the reason is not going to be valid any longer.


I may use SKB_DROP_REASON_NOMEM if you prefer.

Another concern is that many functions may return -ENOMEM. It is more likely
that if there are two "goto drop" to return -ENOMEM, we will not be able to tell
from which function the sk_buff is dropped, e.g.,

if (function_A()) {
reason = -ENOMEM;
goto drop;
}

if (function_B()) {
reason = -ENOMEM;
goto drop;
}

>
>> SKB_DROP_REASON_DEV_HDR, /* there is something wrong with
>> * device driver specific header
>> */
>> + SKB_DROP_REASON_DEV_READY, /* device is not ready */
>
> What is ready? link is not up? peer not connected? can we expand?
>

In this patchset, it is for either:

- tun->tfiles[txq] is not set, or

- !(tun->dev->flags & IFF_UP)

I want to make it very generic so that the sk_buff dropped due to any device
level data structure that is not up/ready/initialized/allocated will use this
reason in the future.


Thank you very much for suggestions!

Dongli Zhang