Re: [PATCH net-next] net:prevent shared skb corruption on rx-gro-list segmentation

From: Eric Dumazet
Date: Fri Oct 27 2023 - 08:40:14 EST


On Fri, Oct 27, 2023 at 1:28 PM Lena Wang (王娜) <Lena.Wang@xxxxxxxxxxxx> wrote:
>
> From 1d4aea846dd08d824e1097a933ed8618a6da03da Mon Sep 17 00:00:00 2001
> From: lena wang <lena.wang@xxxxxxxxxxxx>
> Date: Fri, 27 Oct 2023 17:52:50 +0800
> Subject: [PATCH net-next] net:prevent shared skb corruption on rx-gro-
> list segmentation
>
> shared skb triggers corruptions on rx-gro-list segmentation.
>
> [42438.749474] [ T1425] Unable to handle kernel NULL pointer
> dereference at virtual address 00000000000000d8
> [42438.749530] [ T1425] Mem abort info:
> [42438.749547] [ T1425] ESR = 0x0000000096000006
> [42438.749566] [ T1425] EC = 0x25: DABT (current EL), IL = 32 bits
> [42438.749588] [ T1425] SET = 0, FnV = 0
> [42438.749606] [ T1425] EA = 0, S1PTW = 0
> [42438.749623] [ T1425] FSC = 0x06: level 2 translation fault
> [42438.749642] [ T1425] Data abort info:
> [42438.749656] [ T1425] ISV = 0, ISS = 0x00000006
> [42438.749674] [ T1425] CM = 0, WnR = 0
> [42438.749691] [ T1425] user pgtable: 4k pages, 39-bit VAs,
> pgdp=00000001f96b9000
> [42438.749714] [ T1425] [00000000000000d8] pgd=08000001f96ba003,
> p4d=08000001f96ba003, pud=08000001f96ba003, pmd=0000000000000000
> [42438.749769] [ T1425] Internal error: Oops: 0000000096000006 [#1]
> PREEMPT SMP
> [42438.749796] [ T1425] cpufreq stop DVFS log done
> [42438.797144] [ T1425] Kernel Offset: 0x29eb000000 from
> 0xffffffc008000000
> [42438.797185] [ T1425] PHYS_OFFSET: 0x40000000
> [42438.797203] [ T1425] pstate: 80400005 (Nzcv daif +PAN -UAO)
> [42438.797219] [ T1425] pc : [0xffffffe9f3e38c34]
> __udp_gso_segment+0x24c/0x48c
> [42438.797248] [ T1425] lr : [0xffffffe9f3e38bf4]
> __udp_gso_segment+0x20c/0x48c
> [42438.797269] [ T1425] sp : ffffffc017acb5c0
> [42438.797280] [ T1425] x29: ffffffc017acb5d0 x28: 000000000000c85f
> ......
> [42438.799172] [ T1425] Call trace:
> [42438.799184] [ T1425] dump_backtrace+0xf4/0x118
> [42438.799210] [ T1425] show_stack+0x18/0x24
> [42438.799227] [ T1425] dump_stack_lvl+0x60/0x7c
> [42438.799250] [ T1425] dump_stack+0x18/0x3c
> [42438.799270] [ T1425] mrdump_common_die+0x24c/0x398 [mrdump]
> [42438.799354] [ T1425] ipanic_die+0x20/0x34 [mrdump]
> [42438.799425] [ T1425] notify_die+0x80/0xd8
> [42438.799447] [ T1425] die+0x94/0x2b8
> [42438.799463] [ T1425] __do_kernel_fault+0x264/0x298
> [42438.799485] [ T1425] do_page_fault+0x98/0x4a0
> [42438.799503] [ T1425] do_translation_fault+0x38/0x54
> [42438.799521] [ T1425] do_mem_abort+0x58/0x118
> [42438.799540] [ T1425] el1_abort+0x3c/0x5c
> [42438.799555] [ T1425] el1h_64_sync_handler+0x54/0x90
> [42438.799575] [ T1425] el1h_64_sync+0x68/0x6c
> [42438.799591] [ T1425] __udp_gso_segment+0x24c/0x48c
> [42438.799611] [ T1425] udp4_ufo_fragment+0x118/0x15c
> [42438.799628] [ T1425] inet_gso_segment+0x164/0x338
> [42438.799644] [ T1425] skb_mac_gso_segment+0xc4/0x13c
> [42438.799666] [ T1425] __skb_gso_segment+0xc4/0x124
> [42438.799686] [ T1425] validate_xmit_skb+0x9c/0x2e0
> [42438.799703] [ T1425] validate_xmit_skb_list+0x4c/0x80
> [42438.799722] [ T1425] sch_direct_xmit+0x70/0x3d0
> [42438.799739] [ T1425] __dev_queue_xmit+0x5f0/0xd40
> [42438.799757] [ T1425] ip_finish_output2+0x3f8/0x460
> [42438.799774] [ T1425] __ip_finish_output+0x194/0x240
> [42438.799790] [ T1425] ip_finish_output+0x20/0xf4
> [42438.799810] [ T1425] ip_output+0x100/0x1a0
> [42438.799830] [ T1425] NF_HOOK+0xac/0x154
> [42438.799848] [ T1425] ip_forward+0x308/0x320
> [42438.799868] [ T1425] ip_sublist_rcv+0x1f0/0x25c
> [42438.799887] [ T1425] ip_list_rcv+0x138/0x174
> [42438.799905] [ T1425] __netif_receive_skb_list_core+0x1e8/0x28c
> [42438.799923] [ T1425] netif_receive_skb_list_internal+0x1ec/0x2b4
> [42438.799942] [ T1425] netif_receive_skb_list+0x2c/0x144
> [42438.799960] [ T1425] ccmni_queue_state_callback+0x98/0x304 [ccmni]
> [42438.800009] [ T1425] port_net_queue_state_notify+0x178/0x1c0
> [ccci_md_all]
> [42438.800257] [ T1425] ccci_port_queue_status_notify+0xf0/0x2a8
> [ccci_md_all]
> [42438.800496] [ T1425] dpmaif_rxq_push_thread+0xb0/0x1d4
> [ccci_dpmaif]
> [42438.800602] [ T1425] kthread+0x104/0x1d4
> [42438.800621] [ T1425] ret_from_fork+0x10/0x20
>
> In some scenarios the GRO-ed skb shared with multi users. This
> segmentation touches the shared heads which sets frag_list to null.
> After linearization the skb->next is null which results the corruption.

What scenarios would do that ?

Packets sent to ndo_start_xmit() are not supposed to be shared. (with
pktgen exception, which is well understood)

A driver wants to be able to overwrite parts of skb (skb->next and
many other fields), this is not specific to GSO

A real fix (if needed) would probably be elsewhere.

>
> So for shared skb, it needs to clone first than unclone with header and
> data separated for different devices.
>
> Signed-off-by: lena wang <lena.wang@xxxxxxxxxxxx>

Why targeting net-next ?


> ---
> net/core/skbuff.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b157efea5dea..adeb3ad9697b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4351,6 +4351,12 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>
> skb_push(skb, -skb_network_offset(skb) + offset);
>
> + if (skb_shared(skb)) {
> + skb = skb_share_check(skb, GFP_ATOMIC);
> + if (!skb)
> + goto err_linearize;
> + }
> +
> /* Ensure the head is writeable before touching the shared info
> */
> err = skb_unclone(skb, GFP_ATOMIC);
> if (err)
> --
> 2.18.0