Re: [PATCH v1 1/1] xfrm: Use skb_mac_header_was_set() to check for MAC header presence

From: Marcello Sylverster Bauer
Date: Mon Sep 04 2023 - 06:09:08 EST


Hi Eval,

On 9/2/23 20:39, Eyal Birger wrote:
Hi Marcello,

On Fri, Sep 1, 2023 at 7:15 PM Marcello Sylvester Bauer
<email@xxxxxxxxxxxxxxxxxx> wrote:

From: Marcello Sylvester Bauer <sylv@xxxxxxx>

Replace skb->mac_len with skb_mac_header_was_set() in
xfrm4_remove_tunnel_encap() and xfrm6_remove_tunnel_encap() to detect
the presence of a MAC header. This change prevents a kernel page fault
that could occur when a MAC address is added to an L3 interface using
xfrm.

Signed-off-by: Marcello Sylvester Bauer <sylv@xxxxxxx>
---
net/xfrm/xfrm_input.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index d5ee96789d4b..afa1f1052065 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -250,7 +250,7 @@ static int xfrm4_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb)

skb_reset_network_header(skb);
skb_mac_header_rebuild(skb);
- if (skb->mac_len)
+ if (skb_mac_header_was_set(skb))
eth_hdr(skb)->h_proto = skb->protocol;

err = 0;
@@ -287,7 +287,7 @@ static int xfrm6_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb)

skb_reset_network_header(skb);
skb_mac_header_rebuild(skb);
- if (skb->mac_len)
+ if (skb_mac_header_was_set(skb))
eth_hdr(skb)->h_proto = skb->protocol;

I checked xfrm tunnels over both GRE and IPIP, and in both cases when reaching
this code the skb->mac_len was 0, whereas skb_mac_header_was_set()
was 1.

So in both cases this suggested patch would make this condition true and
write to an Ethernet header location on these L3 devices.

Oh, so my first guess was right and we need to check for both.

We were able to get socket buffers where the mac_len was non-zero, but the mac_header was not set. Not sure if this is a bug from somewhere else, but this causes an "unable to handle page fault for address XXX" error.

I have attached a decoded crash log to this email using kernel version 6.1.38.

As you can see, the line net/xfrm/xfrm_input.c:290, which contains the following, causes the error:
```
eth_hdr(skb)->h_proto = skb->protocol;
```

IIUC RAX contains mac_header and is set to 0xffff, meaning it is not set. This causes skb_mac_header to reference a wrong address for the header, resulting in the page fault.


Can you please share your reproduction scenario for this case?

We have a special setup where a WWAN interface is forwarded to a QEMU VM as a virtual network interface. Here is the relevant information based on the internal bug report that addresses this issue:

---

Kernel version: 6.1.38

Problem occurs with quectel em120r-gl and Intel 5000 5G modems.
Both modems do not support bridge mode, i.e. no L2 but only L3 ipraw communication. Specifically, the interface has no MAC address.

Our application uses by default the MAC address of the physical adapter for the virtual adapter, so e.g. host entries in dnsmasq are created in the direction of windows VM.

Modify net_hotplug in udev: Assign fake MAC address to wwan interface (actually writes only the MAC to a file, which is passed to the VM as MAC in the hotplug case)

# mac address required for hotplug -> end
#ATTRS{address}=="", GOTO="tvpnc_network_end"

ACTION=="add", ATTRS{address}=="", \
NAME=="en*|wl*|ww*", \
RUN+="/usr/share/scripts/net_hotplug.sh -a -i $name -m aa:bb:cc:dd:ee:ff"

ACTION=="add", ATTRS{address}!="", \
NAME=="en*|wl*|ww*", \
RUN+="/usr/share/scripts/net_hotplug.sh -a -i $name -m $attr{address}"

When Windows establishes a cellular connection and connects to a strongSwan VPN, the host Linux kernel crashes and immediately restarts.

---

The custom QEMU hotplug script net_hotplug.sh, which adds the virtual network interface to the running VM and binds it to the VM using the real or the "fake" Mac address.

I got feedback that adding "skb_mac_header_was_set(skb)" to the condition fixed the problem.

The 2nd version of this patch will check for both bot, mac_len and if the header was set.

Thanks
Marcello


Thanks,
Eyal.


err = 0;
--
2.42.0

[ 1008.666817] BUG: unable to handle page fault for address: ffff888101be794b
[ 1008.667469] #PF: supervisor write access in kernel mode
[ 1008.668200] #PF: error_code(0x0003) - permissions violation
[ 1008.668939] PGD 2801067 P4D 2801067 PUD 106a13063 PMD 103e8f063 PTE 8000000101be7061
[ 1008.669697] Oops: 0003 [#1] PREEMPT SMP NOPTI
[ 1008.670435] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 6.1.38 #1
[ 1008.671193] Hardware name: LENOVO 20WM00A7GE/20WM00A7GE, BIOS N35ET50W (1.50 ) 09/15/2022
[ 1008.671990] RIP: 0010:xfrm_input (net/xfrm/xfrm_input.c:290 net/xfrm/xfrm_input.c:349 net/xfrm/xfrm_input.c:391 net/xfrm/xfrm_input.c:443 net/xfrm/xfrm_input.c:689)
[ 1008.672872] Code: 8d 0b 00 66 41 83 7c 24 78 00 0f 84 5d fa ff ff 41 0f b7 8c 24 b0 00 00 00 41 0f b7 84 24 b6 00 00 00 49 8b 94 24 c0 00 00 00 <66> 89 4c 02 0c e9 39 fa ff ff 41 b8 08 00 00 00 66 45 89 84 24 b0
All code
========
0: 8d 0b lea (%rbx),%ecx
2: 00 66 41 add %ah,0x41(%rsi)
5: 83 7c 24 78 00 cmpl $0x0,0x78(%rsp)
a: 0f 84 5d fa ff ff je 0xfffffffffffffa6d
10: 41 0f b7 8c 24 b0 00 movzwl 0xb0(%r12),%ecx
17: 00 00
19: 41 0f b7 84 24 b6 00 movzwl 0xb6(%r12),%eax
20: 00 00
22: 49 8b 94 24 c0 00 00 mov 0xc0(%r12),%rdx
29: 00
2a:* 66 89 4c 02 0c mov %cx,0xc(%rdx,%rax,1) <-- trapping instruction
2f: e9 39 fa ff ff jmp 0xfffffffffffffa6d
34: 41 b8 08 00 00 00 mov $0x8,%r8d
3a: 66 data16
3b: 45 rex.RB
3c: 89 .byte 0x89
3d: 84 24 b0 test %ah,(%rax,%rsi,4)

Code starting with the faulting instruction
===========================================
0: 66 89 4c 02 0c mov %cx,0xc(%rdx,%rax,1)
5: e9 39 fa ff ff jmp 0xfffffffffffffa43
a: 41 b8 08 00 00 00 mov $0x8,%r8d
10: 66 data16
11: 45 rex.RB
12: 89 .byte 0x89
13: 84 24 b0 test %ah,(%rax,%rsi,4)
[ 1008.673815] RSP: 0018:ffffc900001e4b28 EFLAGS: 00010206
[ 1008.674754] RAX: 000000000000ffff RBX: ffff88811955da51 RCX: 0000000000000008
[ 1008.675751] RDX: ffff888101bd7940 RSI: 0000000000000004 RDI: 0000000000000004
[ 1008.676939] RBP: ffffc900001e4b90 R08: 0000000000000041 R09: 0000000000000000
[ 1008.677900] R10: 0000000000000080 R11: 0000000000000000 R12: ffff88811955da00
[ 1008.678885] R13: 0000000000000002 R14: 0000000000000000 R15: ffff88811f793600
[ 1008.679703] FS: 0000000000000000(0000) GS:ffff8882576c0000(0000) knlGS:0000000000000000
[ 1008.680497] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1008.681303] CR2: ffff888101be794b CR3: 000000000240a003 CR4: 0000000000772ea0
[ 1008.682109] PKRU: 55555554
[ 1008.682927] Call Trace:
[ 1008.683772] <IRQ>
[ 1008.684661] ? show_regs.part.0 (arch/x86/kernel/dumpstack.c:479)
[ 1008.685537] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[ 1008.686366] ? page_fault_oops (arch/x86/mm/fault.c:727)
[ 1008.687207] ? kernelmode_fixup_or_oops (arch/x86/mm/fault.c:782)
[ 1008.688050] ? __bad_area_nosemaphore (arch/x86/mm/fault.c:880)
[ 1008.688868] ? bad_area_nosemaphore (arch/x86/mm/fault.c:887)
[ 1008.689696] ? exc_page_fault (arch/x86/mm/fault.c:1232 arch/x86/mm/fault.c:1469 arch/x86/mm/fault.c:1527)
[ 1008.690511] ? skb_copy_bits (net/core/skbuff.c:2531)
[ 1008.691315] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
[ 1008.692232] ? xfrm_input (net/xfrm/xfrm_input.c:290 net/xfrm/xfrm_input.c:349 net/xfrm/xfrm_input.c:391 net/xfrm/xfrm_input.c:443 net/xfrm/xfrm_input.c:689)
[ 1008.693039] ? xfrm_input (net/xfrm/xfrm_input.c:677)
[ 1008.693921] xfrmi_input (net/xfrm/xfrm_interface_core.c:336)
[ 1008.694981] xfrmi4_input (net/xfrm/xfrm_interface_core.c:352)
[ 1008.695779] xfrm4_rcv_encap (net/ipv4/xfrm4_protocol.c:84)
[ 1008.696502] xfrm4_udp_encap_rcv (net/ipv4/xfrm4_input.c:161)
[ 1008.697127] ? xfrm4_rcv (net/ipv4/xfrm4_input.c:83)
[ 1008.697765] udp_queue_rcv_one_skb (net/ipv4/udp.c:2154)
[ 1008.698397] udp_queue_rcv_skb (net/ipv4/udp.c:2245)
[ 1008.699021] udp_unicast_rcv_skb (net/ipv4/udp.c:2399 (discriminator 3))
[ 1008.699657] __udp4_lib_rcv (net/ipv4/udp.c:2463)
[ 1008.700242] ? ipt_do_table (net/ipv4/netfilter/ip_tables.c:362)
[ 1008.700792] ? raw_local_deliver (net/ipv4/raw.c:199)
[ 1008.701371] udp_rcv (net/ipv4/udp.c:2644)
[ 1008.701981] ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 4))
[ 1008.702591] ip_local_deliver_finish (./include/linux/rcupdate.h:779 net/ipv4/ip_input.c:234)
[ 1008.703202] ip_local_deliver (./include/linux/netfilter.h:302 ./include/linux/netfilter.h:296 net/ipv4/ip_input.c:254)
[ 1008.703824] ? ip_protocol_deliver_rcu (net/ipv4/ip_input.c:228)
[ 1008.704440] ip_rcv (./include/net/dst.h:454 net/ipv4/ip_input.c:449 ./include/linux/netfilter.h:302 ./include/linux/netfilter.h:296 net/ipv4/ip_input.c:569)
[ 1008.705059] ? ip_sublist_rcv (net/ipv4/ip_input.c:436)
[ 1008.705666] __netif_receive_skb_one_core (net/core/dev.c:5496 (discriminator 4))
[ 1008.706271] process_backlog (./include/linux/rcupdate.h:779 net/core/dev.c:5939)
[ 1008.706886] ? _raw_read_unlock_bh (kernel/locking/spinlock.c:285)
[ 1008.707463] __napi_poll (net/core/dev.c:6505)
[ 1008.708024] net_rx_action (net/core/dev.c:6574 net/core/dev.c:6683)
[ 1008.708658] __do_softirq (kernel/softirq.c:571)
[ 1008.709214] irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650 kernel/softirq.c:662)
[ 1008.709764] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
[ 1008.710323] </IRQ>
[ 1008.710859] <TASK>
[ 1008.711393] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:640)
[ 1008.711927] RIP: 0010:cpuidle_enter_state (drivers/cpuidle/cpuidle.c:261)
[ 1008.712471] Code: f8 7c 9d ff 31 ff 49 89 c7 e8 be 14 9d ff 80 7d d7 00 74 12 9c 58 f6 c4 02 0f 85 c7 01 00 00 31 ff e8 c6 8e a1 ff fb 45 85 f6 <0f> 88 c9 00 00 00 49 63 ce 4c 89 fa 48 2b 55 c8 48 6b f1 68 48 8d
All code
========
0: f8 clc
1: 7c 9d jl 0xffffffffffffffa0
3: ff 31 push (%rcx)
5: ff 49 89 decl -0x77(%rcx)
8: c7 (bad)
9: e8 be 14 9d ff call 0xffffffffff9d14cc
e: 80 7d d7 00 cmpb $0x0,-0x29(%rbp)
12: 74 12 je 0x26
14: 9c pushf
15: 58 pop %rax
16: f6 c4 02 test $0x2,%ah
19: 0f 85 c7 01 00 00 jne 0x1e6
1f: 31 ff xor %edi,%edi
21: e8 c6 8e a1 ff call 0xffffffffffa18eec
26: fb sti
27: 45 85 f6 test %r14d,%r14d
2a:* 0f 88 c9 00 00 00 js 0xf9 <-- trapping instruction
30: 49 63 ce movslq %r14d,%rcx
33: 4c 89 fa mov %r15,%rdx
36: 48 2b 55 c8 sub -0x38(%rbp),%rdx
3a: 48 6b f1 68 imul $0x68,%rcx,%rsi
3e: 48 rex.W
3f: 8d .byte 0x8d

Code starting with the faulting instruction
===========================================
0: 0f 88 c9 00 00 00 js 0xcf
6: 49 63 ce movslq %r14d,%rcx
9: 4c 89 fa mov %r15,%rdx
c: 48 2b 55 c8 sub -0x38(%rbp),%rdx
10: 48 6b f1 68 imul $0x68,%rcx,%rsi
14: 48 rex.W
15: 8d .byte 0x8d
[ 1008.713054] RSP: 0018:ffffc90000137e60 EFLAGS: 00000206
[ 1008.713638] RAX: ffff8882576e1e80 RBX: ffff8882576ea8a0 RCX: 000000000000001f
[ 1008.714225] RDX: 0000000000000000 RSI: 0000000034e8f93a RDI: 0000000000000000
[ 1008.714892] RBP: ffffc90000137e98 R08: 000000ead937ca49 R09: 00000000000217c0
[ 1008.715645] R10: 0000000000000001 R11: ffff8882576e0f04 R12: 0000000000000003
[ 1008.716614] R13: ffffffff82462360 R14: 0000000000000003 R15: 000000ead937ca49
[ 1008.717414] cpuidle_enter (drivers/cpuidle/cpuidle.c:358)
[ 1008.718116] call_cpuidle (kernel/sched/idle.c:156)
[ 1008.718713] do_idle (kernel/sched/idle.c:240 kernel/sched/idle.c:303)
[ 1008.719288] cpu_startup_entry (kernel/sched/idle.c:399 (discriminator 1))
[ 1008.719874] start_secondary (arch/x86/kernel/smpboot.c:281)
[ 1008.720441] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
[ 1008.721006] </TASK>
[ 1008.721539] Modules linked in: snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 openvswitch mhi_wwan_mbim mhi_wwan_ctrl wwan snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence iwlmvm snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof mac80211 snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core libarc4 snd_compress cdc_mbim intel_gtt soundwire_bus ttm iwlwifi cdc_wdm drm_buddy snd_hda_intel drm_display_helper snd_intel_dspcfg snd_intel_sdw_acpi drm_kms_helper snd_hda_codec processor_thermal_device_pci_legacy nls_iso8859_1 cdc_ncm nls_cp437 processor_thermal_device snd_hwdep cdc_ether intel_tcc_cooling x86_pkg_temp_thermal snd_hda_core processor_thermal_rfim intel_powerclamp syscopyarea usbnet cfg80211 processor_thermal_mbox mii sysfillrect hid_multitouch
[ 1008.721584] snd_pcm coretemp input_leds sysimgblt think_lmi intel_rapl_msr mhi_pci_generic i2c_i801 thinkpad_acpi processor_thermal_rapl snd_timer i2c_designware_platform mhi firmware_attributes_class serio_raw thunderbolt i2c_designware_core intel_rapl_common nvram fb_sys_fops tpm_crb platform_profile ucsi_acpi int3400_thermal tpm_tis intel_lpss_pci typec_ucsi pinctrl_tigerlake ledtrig_audio intel_lpss int3403_thermal i2c_smbus roles tpm_tis_core video mfd_core acpi_thermal_rel pinctrl_intel intel_soc_dts_iosf int340x_thermal_zone wmi tdisk [last unloaded: crypto_simd]
[ 1008.727258] CR2: ffff888101be794b