Re: kernel BUG in __ip_make_skb()

From: Willem de Bruijn
Date: Sun Jan 08 2023 - 14:38:37 EST


On Thu, Jan 5, 2023 at 4:17 PM Fedor Pchelkin <pchelkin@xxxxxxxxx> wrote:
>
> Syzkaller reports the following crash:
>
> kernel BUG at include/linux/skbuff.h:2311!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> CPU: 2 PID: 4615 Comm: syz-executor260 Not tainted 5.10.152-syzkaller #0
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> RIP: 0010:__skb_pull include/linux/skbuff.h:2311 [inline]
> RIP: 0010:__ip_make_skb.cold+0x5b/0x5d net/ipv4/ip_output.c:1507
> Code: 79 a2 f9 8b 44 24 3c 89 ee 48 c7 c7 00 2e de 88 41 89 45 70 e8 d1 84 de ff 31 d2 4c 89 ee 48 c7 c7 40 2e de 88 e8 a1 26 ff ff <0f> 0b e8 63 79 a2 f9 e8 4e f7 e2 f9 48 c7 c7 40 39 de 88 e8 5e c1
> RSP: 0018:ffff88801e0af698 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffff88814c408288 RCX: ffffffff87ce00ec
> RDX: ffff888024a11ac0 RSI: ffffffff87ceccc6 RDI: 0000000000000001
> RBP: 0000000000000028 R08: 000000000000003b R09: ffff8880b8438ba7
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff8881436a9cc0
> R13: ffff8881436a9cc0 R14: ffff8881436a9e00 R15: dffffc0000000000
> FS: 00007f2750eb7700(0000) GS:ffff8880b8500000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4438a29010 CR3: 0000000021f58000 CR4: 0000000000350ee0
> Call Trace:
> ip_finish_skb include/net/ip.h:241 [inline]
> udp_push_pending_frames net/ipv4/udp.c:979 [inline]
> udp_sendpage+0x36e/0x570 net/ipv4/udp.c:1354
> inet_sendpage+0xd3/0x140 net/ipv4/af_inet.c:831
> kernel_sendpage.part.0+0x13c/0x280 net/socket.c:3514
> kernel_sendpage net/socket.c:3511 [inline]
> sock_sendpage+0xe5/0x140 net/socket.c:944
> pipe_to_sendpage+0x2af/0x380 fs/splice.c:364
> splice_from_pipe_feed fs/splice.c:418 [inline]
> __splice_from_pipe+0x3e5/0x840 fs/splice.c:562
> splice_from_pipe fs/splice.c:597 [inline]
> generic_splice_sendpage+0xd4/0x140 fs/splice.c:743
> do_splice_from fs/splice.c:764 [inline]
> direct_splice_actor+0x10f/0x170 fs/splice.c:933
> splice_direct_to_actor+0x38f/0x990 fs/splice.c:888
> do_splice_direct+0x1b3/0x280 fs/splice.c:976
> do_sendfile+0x553/0x10a0 fs/read_write.c:1257
> __do_sys_sendfile64 fs/read_write.c:1318 [inline]
> __se_sys_sendfile64 fs/read_write.c:1304 [inline]
> __x64_sys_sendfile64+0x1d0/0x210 fs/read_write.c:1304
> do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
> entry_SYSCALL_64_after_hwframe+0x61/0xc6
>
> It was actually found on a 5.10 kernel instance but I didn't find any
> upstream commit referencing something of that kind so this bug is likely
> to be in upstream, too. The reproducers unfortunately do not work on my
> machine, but I'll add one in the next email for additional info.
>
> From here the terminology is used as from the fragment of __ip_make_skb():
> ---
> if (skb->data < skb_network_header(skb))
> __skb_pull(skb, skb_network_offset(skb));
> while ((tmp_skb = __skb_dequeue(queue)) != NULL) {
> __skb_pull(tmp_skb, skb_network_header_len(skb)); <-- BUG is here
> ---
>
> We get the first fragment (called 'skb') from the queue and then start
> getting another fragments ('tmp_skb') from the queue and combine them to
> the first fragment.
>
> The problem is that the difference between tmp_skb->len and
> tmp_skb->data_len is smaller than the length of skb network (IP) header
> and while doing __skb_pull(), tmp_skb->len becomes smaller than
> tmp_skb->data_len causing a bug.
>
> Something is probably wrong with IP header layout of the first fragment
> (the SKB to which we are combining another ones). It is 40 bytes long, and
> the tmp_skb IP header's length is 20 bytes (have a look at debug info
> lower). The first fragment, however, can contain some specific IP options
> which are stored only in this fragment and are not included into the
> following ones. On the other hand, the problem can be with tmp_skb where
> something casued its data_len be incorrect.

The assumption in the code that the network header length is the same
for all fragments apparently does not hold.

ip_append_page reserves head room as follows:

fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
[..]
skb_reset_network_header(skb);
skb->transport_header = (skb->network_header +
fragheaderlen);

The reproducer you shared makes no explicit system call that modifies
optional header length, like IP_OPTIONS.

But it opens a PF_INET6/SOCK_DGRAM socket, and yet the stack trace
shows an IPv4 stack.

The repro shows multiple connect calls.

Perhaps it manages to switch between IPv6 and v4-mapped-v6 socket in
between sendmsg/sendpage calls?

Changing the argument to skb_network_header_len in
`__skb_pull(tmp_skb, skb_network_header_len(skb));` might
superficially address the out-of-bounds read. But it should probably
not be possible to switch between IPv6 and IPv4 while a datagram is
being constructed.

> Maybe there is some sanity check missing while constructing an IP datagram
> in ip_append_page()? Additional check of extra IP headers or MTU
> values...?
>
> We managed to get some debug info about the failing SKBs.
>
> tmp_skb info:
> skb len=1476 headroom=160 headlen=20 tailroom=0
> mac=(-1,-1) net=(160,20) trans=180
> shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0))
> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> hash(0xe62f05e3 sw=0 l4=1) proto=0x0000 pkttype=0 iif=0
> sk family=2 type=2 proto=17
> skb linear: 00000000: 00 00 00 00 00 00 00 00 30 06 a9 86 ff ff ff ff
> skb linear: 00000010: 02 00 00 00
> skb frag: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000110: 00 00 00 00 00 00 00 00 00 00 00 00
>
> skb info:
> skb len=1476 headroom=168 headlen=241 tailroom=0
> mac=(-1,-1) net=(168,40) trans=208
> shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0))
> csum(0x3f886391 ip_summed=0 complete_sw=0 valid=0 level=0)
> hash(0xe62f05e3 sw=0 l4=1) proto=0x86dd pkttype=0 iif=0
> sk family=2 type=2 proto=17
> skb linear: 00000000: 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00
> skb linear: 00000010: 80 44 f9 8a ff ff ff ff 00 00 00 00 00 00 00 00
> skb linear: 00000020: 00 00 00 00 00 00 00 00 fe ff ff ff 00 00 00 00
> skb linear: 00000030: f3 57 61 ad ca 6f 38 25 00 62 d5 b1 7d e1 d0 94
> skb linear: 00000040: 04 ae 20 57 20 1a 06 db 10 92 76 4f 8d 2e af 83
> skb linear: 00000050: 91 0c c3 cd b5 d1 96 9e c8 7e c8 e5 90 a9 be aa
> skb linear: 00000060: ae f8 7d 1d bb af 99 62 36 3f c9 a3 44 4e 18 fa
> skb linear: 00000070: 0e 5f 40 32 59 ad 8b 90 89 df 79 63 13 80 da 2b
> skb linear: 00000080: de 47 62 24 61 fd 47 d8 89 4a 74 8a 91 32 aa c6
> skb linear: 00000090: ad 59 30 2f 2c 2e 94 9f 83 00 46 5b f9 11 98 a9
> skb linear: 000000a0: cb ed ca cb 8d 70 d8 78 4c 95 b6 12 af 8b 81 33
> skb linear: 000000b0: 13 58 9d 74 ef 30 91 1d 10 bd 55 22 67 6b b9 43
> skb linear: 000000c0: 97 72 5c a2 c7 24 df f4 2c 3f b8 5e cb 3a 10 f6
> skb linear: 000000d0: 10 5f 3a 11 32 2a d5 22 b2 14 73 c0 1a df b0 6f
> skb linear: 000000e0: 3c 98 91 df bf d2 b6 99 ee 3c fe 91 98 f6 55 20
> skb linear: 000000f0: 45
> skb frag: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> More info:
> tmp_skb->len = 1476; tmp_skb->data_len = 1456
> skb->len = 1476; skb->data_len = 1235
>
> I actually found some commits with resembling call trace but they don't help
> me that much to solve the issue:
> 10b8a3de603d ("ipv6: the entire IPv6 header chain must fit the first fragment")
> e9d3f80935b6 ("net/af_packet: make sure to pull mac header")
> 501a90c94510 ("inet: protect against too small mtu values.")
>
> Fedor