Re: [PATCH] net: add net namespace inode for all net_dev events

From: Steven Rostedt
Date: Tue Mar 09 2021 - 12:41:12 EST


On Tue, 9 Mar 2021 12:43:50 +0800
Tony Lu <tonylu@xxxxxxxxxxxxxxxxx> wrote:

> There are lots of net namespaces on the host runs containers like k8s.
> It is very common to see the same interface names among different net
> namespaces, such as eth0. It is not possible to distinguish them without
> net namespace inode.
>
> This adds net namespace inode for all net_dev events, help us
> distinguish between different net devices.
>
> Output:
> <idle>-0 [006] ..s. 133.306989: net_dev_xmit: net_inum=4026531992 dev=eth0 skbaddr=0000000011a87c68 len=54 rc=0
>
> Signed-off-by: Tony Lu <tonylu@xxxxxxxxxxxxxxxxx>
> ---
> include/trace/events/net.h | 35 +++++++++++++++++++++++++----------
> 1 file changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> index 2399073c3afc..a52f90d83411 100644
> --- a/include/trace/events/net.h
> +++ b/include/trace/events/net.h
> @@ -35,6 +35,7 @@ TRACE_EVENT(net_dev_start_xmit,
> __field( u16, gso_size )
> __field( u16, gso_segs )
> __field( u16, gso_type )
> + __field( unsigned int, net_inum )
> ),

This patch made me take a look at the net_dev_start_xmit trace event, and I
see it's very "holy". That is, it has lots of holes in the structure.

TP_STRUCT__entry(
__string( name, dev->name )
__field( u16, queue_mapping )
__field( const void *, skbaddr )
__field( bool, vlan_tagged )
__field( u16, vlan_proto )
__field( u16, vlan_tci )
__field( u16, protocol )
__field( u8, ip_summed )
__field( unsigned int, len )
__field( unsigned int, data_len )
__field( int, network_offset )
__field( bool, transport_offset_valid)
__field( int, transport_offset)
__field( u8, tx_flags )
__field( u16, gso_size )
__field( u16, gso_segs )
__field( u16, gso_type )
__field( unsigned int, net_inum )
),

If you look at /sys/kernel/tracing/events/net/net_dev_start_xmit/format

name: net_dev_start_xmit
ID: 1581
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:__data_loc char[] name; offset:8; size:4; signed:1;
field:u16 queue_mapping; offset:12; size:2; signed:0;
field:const void * skbaddr; offset:16; size:8; signed:0;

Notice, queue_mapping is 2 bytes at offset 12 (ends at offset 14), but
skbaddr starts at offset 16. That means there's two bytes wasted.

field:bool vlan_tagged; offset:24; size:1; signed:0;
field:u16 vlan_proto; offset:26; size:2; signed:0;

Another byte missing above (24 + 1 != 26)

field:u16 vlan_tci; offset:28; size:2; signed:0;
field:u16 protocol; offset:30; size:2; signed:0;
field:u8 ip_summed; offset:32; size:1; signed:0;
field:unsigned int len; offset:36; size:4; signed:0;

Again another three bytes missing (32 + 1 != 36)

field:unsigned int data_len; offset:40; size:4; signed:0;
field:int network_offset; offset:44; size:4; signed:1;
field:bool transport_offset_valid; offset:48; size:1; signed:0;
field:int transport_offset; offset:52; size:4; signed:1;

Again, another 3 bytes missing (48 + 1 != 52)

field:u8 tx_flags; offset:56; size:1; signed:0;
field:u16 gso_size; offset:58; size:2; signed:0;

Another byte missing (56 + 1 != 58)

field:u16 gso_segs; offset:60; size:2; signed:0;
field:u16 gso_type; offset:62; size:2; signed:0;
field:unsigned int net_inum; offset:64; size:4; signed:0;

The above shows 10 bytes wasted for this event.

The order of the fields is important. Don't worry about breaking API by
fixing it. The parsing code uses this output to find where the binary data
is.

Hmm, I should write a script that reads all the format files and point out
areas that have holes in it.

I haven't looked at the other trace events, but they may all have the same
issues.

-- Steve



>
> TP_fast_assign(
> @@ -56,10 +57,12 @@ TRACE_EVENT(net_dev_start_xmit,
> __entry->gso_size = skb_shinfo(skb)->gso_size;
> __entry->gso_segs = skb_shinfo(skb)->gso_segs;
> __entry->gso_type = skb_shinfo(skb)->gso_type;
> + __entry->net_inum = dev_net(skb->dev)->ns.inum;
> ),
>
>