PROBLEM: Kernel Oops in UDP stack

From: Marcel Hellwig
Date: Tue Jul 31 2018 - 11:12:08 EST


Dear all,

we are facing a problem the UDP Stack in our embedded device based on a LPC3250.

We discovered the bug the first place in the 2.6.39.2 kernel provided by lpc[0].
We tried different newer versions of the kernel until 3.4.113 and the error still occurs.
Newer versions of the kernel have not been tested (yet).

We have a simple program that listens on a multicast address and uses select to query the socket.
We read the data, validate it and process it further (put it into some shared memory via shm_open).
The bandwidth of the traffic is approximately 100Mbit/s.

We tried to debug the error with gdb and printfs, but all the pointer in the relevant section looked sane and a printf of the values did not trigger a panic.
The bug occurs after approximately 15 minutes under high network load, but cannot be triggered reliably.

We found two relevant topics [1][2], but the first one didn't helped and the second one has no answer, but looks promising.

Because this bug affects a lot of our products, we want to develop an intermediate patch, but we need some help to locate the error.
In the long term we want to migrate to a newer kernel, but at the moment we need this fix for our customers.

Can anybody help us to spot out the error so that we can develop a patch for this problem?
Perhaps this is a known issue and a solution is already available.



Further information:
https://gist.github.com/hellow554/6b11c6c0827d5db80a7e66f71f5636ff

/proc/version:
Linux version 3.4.113.7 (buildroot@buildroot) (gcc version 4.9.4 (Buildroot 2018.02.1) ) #1 PREEMPT Mon Apr 9 23:40:00 CEST 2018

Kernel oops:
[ 1125.090000] Unable to handle kernel paging request at virtual address c14fe63a
[ 1125.100000] pgd = c14d8000
[ 1125.100000] [c14fe63a] *pgd=8140041e(bad)
[ 1125.100000] Internal error: Oops: 1 [#1] PREEMPT ARM
[ 1125.100000] Modules linked in:
[ 1125.100000] CPU: 0 Not tainted (3.4.113.7 #1)
[ 1125.100000] PC is at udp_recvmsg+0x284/0x33c
[ 1125.100000] LR is at 0x0
[ 1125.100000] pc : [<c0228adc>] lr : [<00000000>] psr: a0000013
[ 1125.100000] sp : c1e67d10 ip : 00000000 fp : 0000004a
[ 1125.100000] r10: c1e67d34 r9 : 0000004a r8 : 00000000
[ 1125.100000] r7 : 000005c0 r6 : c1e10220 r5 : c1e67f7c r4 : c14f4640
[ 1125.100000] r3 : c14fe62e r2 : c1e67ec0 r1 : 00000008 r0 : c1e67ec8
[ 1125.100000] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 1125.100000] Control: 0005317f Table: 814d8000 DAC: 00000015
[ 1125.100000] Process trdp.release (pid: 132, stack limit = 0xc1e66270)
[ 1125.100000] Stack: (0xc1e67d10 to 0xc1e68000)
[ 1125.100000] 7d00: c1e67d34 00004348 00000001 0000004a
[ 1125.100000] 7d20: 00000000 c1e67ec0 c1e10220 00000000 00000000 00000000 c022aea4 c1e67f7c
[ 1125.100000] 7d40: 00000000 00000000 00000000 000005c0 00000000 c1e67f7c c1e67ec0 c02306e0
[ 1125.100000] 7d60: 00000000 00000000 c1e67d74 00000000 c1e10220 00000000 c1e67d90 00000000
[ 1125.100000] 7d80: c3483000 c01d2c38 00000000 00000001 00000001 00000000 00000000 000005c0
[ 1125.100000] 7da0: c3483000 c005c5d4 00000000 c1e67f7c 00000000 c03482b8 c1e66000 c1e67ee0
[ 1125.100000] 7dc0: c1e67e4c 0001424f c0345ba0 c035fca0 c035fca8 c005c6a8 00000000 00000001
[ 1125.100000] 7de0: ffffffff 00000000 00000000 00000000 00000000 00000000 c3887700 c0022608
[ 1125.100000] 7e00: 00000000 00000000 c01e59b4 00000001 c1e67d90 00000000 00000000 beca3b78
[ 1125.100000] 7e20: 00000004 00000000 00000004 c00b05b0 c1e67e48 c1e67e4c c1e67e50 00000001
[ 1125.100000] 7e40: c1e67f7c c3483000 beca35bc c1e67e80 c1e67f7c c1e67e80 c01d2b90 c3483000
[ 1125.100000] 7e60: beca35bc 00000000 c1e67e80 c01d3fac beca35d8 00000008 beca35c0 beca359c
[ 1125.100000] 7e80: b6ab34aa 00000576 00000001 c0022128 c025ce20 c1e49a80 00000009 0001424e
[ 1125.100000] 7ea0: 40008000 c1e66008 0000001d 00000000 c1e67f14 c3887700 c1e66008 0000001d
[ 1125.100000] 7ec0: b0040002 1714010a 00000000 00000000 c00189a8 00000013 f4008000 c000dbf4
[ 1125.100000] 7ee0: 81e68000 c1e49180 00000000 00000000 c1e1e000 c003c2f8 c1e1e000 00000002
[ 1125.100000] 7f00: c036092c c0346700 00000000 00000000 ffffffff 00000000 ffffffff 00000000
[ 1125.100000] 7f20: c1e67f78 c1e66000 c1e67f78 beca3cc0 00000001 beca3cc0 00000008 beca35bc
[ 1125.100000] 7f40: 00000000 c3483000 beca35bc 00000000 00000129 c000e188 c1e66000 00000000
[ 1125.100000] 7f60: beca37e0 c01d4fbc 00000000 beca3860 beca3938 00000000 fffffff7 c1e67ec0
[ 1125.100000] 7f80: 00000000 c1e67e80 00000001 beca359c 00000020 00000000 beca359c b6ab3460
[ 1125.100000] 7fa0: 00000000 c000dfe0 beca359c b6ab3460 00000004 beca35bc 00000000 00000020
[ 1125.100000] 7fc0: beca359c b6ab3460 00000000 00000129 beca37f4 00000000 00056178 beca37e0
[ 1125.100000] 7fe0: 00000004 beca33d0 0001a904 b6f6d8dc 60000010 00000004 83ffe831 83ffec31
[ 1125.100000] [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c)
[ 1125.100000] [<c02306e0>] (inet_recvmsg+0x38/0x4c) from [<c01d2c38>] (sock_recvmsg+0xa8/0xcc)
[ 1125.100000] [<c01d2c38>] (sock_recvmsg+0xa8/0xcc) from [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc)
[ 1125.100000] [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc) from [<c01d4fbc>] (__sys_recvmsg+0x50/0x80)
[ 1125.100000] [<c01d4fbc>] (__sys_recvmsg+0x50/0x80) from [<c000dfe0>] (ret_fast_syscall+0x0/0x2c)
[ 1125.100000] Code: e1d330b0 e3a01008 e1c230b2 e5943080 (e593300c)
[ 1125.430000] ---[ end trace f0b7642b14562089 ]---
[ 1125.440000] ------------[ cut here ]------------
[ 1125.450000] WARNING: at net/ipv4/af_inet.c:153 inet_sock_destruct+0x188/0x1a8()
[ 1125.460000] Modules linked in:
[ 1125.460000] [<c0013ac8>] (unwind_backtrace+0x0/0xec) from [<c001c588>] (warn_slowpath_common+0x4c/0x64)
[ 1125.470000] [<c001c588>] (warn_slowpath_common+0x4c/0x64) from [<c001c63c>] (warn_slowpath_null+0x1c/0x24)
[ 1125.480000] [<c001c63c>] (warn_slowpath_null+0x1c/0x24) from [<c0230888>] (inet_sock_destruct+0x188/0x1a8)
[ 1125.490000] [<c0230888>] (inet_sock_destruct+0x188/0x1a8) from [<c01d75f4>] (__sk_free+0x18/0x154)
[ 1125.500000] [<c01d75f4>] (__sk_free+0x18/0x154) from [<c0230aa0>] (inet_release+0x44/0x70)
[ 1125.510000] [<c0230aa0>] (inet_release+0x44/0x70) from [<c01d3714>] (sock_release+0x20/0xc8)
[ 1125.510000] [<c01d3714>] (sock_release+0x20/0xc8) from [<c01d37d0>] (sock_close+0x14/0x2c)
[ 1125.520000] [<c01d37d0>] (sock_close+0x14/0x2c) from [<c00a0044>] (fput+0xb4/0x27c)
[ 1125.530000] [<c00a0044>] (fput+0xb4/0x27c) from [<c009d64c>] (filp_close+0x64/0x88)
[ 1125.540000] [<c009d64c>] (filp_close+0x64/0x88) from [<c001fb28>] (put_files_struct+0x80/0xe0)
[ 1125.550000] [<c001fb28>] (put_files_struct+0x80/0xe0) from [<c0020388>] (do_exit+0x4c8/0x748)
[ 1125.560000] [<c0020388>] (do_exit+0x4c8/0x748) from [<c0011894>] (die+0x214/0x240)
[ 1125.560000] [<c0011894>] (die+0x214/0x240) from [<c0256160>] (__do_kernel_fault.part.0+0x54/0x74)
[ 1125.570000] [<c0256160>] (__do_kernel_fault.part.0+0x54/0x74) from [<c0015188>] (do_bad_area+0x88/0x8c)
[ 1125.580000] [<c0015188>] (do_bad_area+0x88/0x8c) from [<c00173dc>] (do_alignment+0xf0/0x938)
[ 1125.590000] [<c00173dc>] (do_alignment+0xf0/0x938) from [<c000862c>] (do_DataAbort+0x34/0x98)
[ 1125.600000] [<c000862c>] (do_DataAbort+0x34/0x98) from [<c000db98>] (__dabt_svc+0x38/0x60)
[ 1125.610000] Exception stack(0xc1e67cc8 to 0xc1e67d10)
[ 1125.610000] 7cc0: c1e67ec8 00000008 c1e67ec0 c14fe62e c14f4640 c1e67f7c
[ 1125.620000] 7ce0: c1e10220 000005c0 00000000 0000004a c1e67d34 0000004a 00000000 c1e67d10
[ 1125.630000] 7d00: 00000000 c0228adc a0000013 ffffffff
[ 1125.640000] [<c000db98>] (__dabt_svc+0x38/0x60) from [<c0228adc>] (udp_recvmsg+0x284/0x33c)
[ 1125.650000] [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c)
[ 1125.650000] [<c02306e0>] (inet_recvmsg+0x38/0x4c) from [<c01d2c38>] (sock_recvmsg+0xa8/0xcc)
[ 1125.660000] [<c01d2c38>] (sock_recvmsg+0xa8/0xcc) from [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc)
[ 1125.670000] [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc) from [<c01d4fbc>] (__sys_recvmsg+0x50/0x80)
[ 1125.680000] [<c01d4fbc>] (__sys_recvmsg+0x50/0x80) from [<c000dfe0>] (ret_fast_syscall+0x0/0x2c)
[ 1125.690000] ---[ end trace f0b7642b1456208a ]---
[ 1125.700000] ------------[ cut here ]------------
[ 1125.700000] WARNING: at net/ipv4/af_inet.c:156 inet_sock_destruct+0x158/0x1a8()
[ 1125.710000] Modules linked in:
[ 1125.710000] [<c0013ac8>] (unwind_backtrace+0x0/0xec) from [<c001c588>] (warn_slowpath_common+0x4c/0x64)
[ 1125.720000] [<c001c588>] (warn_slowpath_common+0x4c/0x64) from [<c001c63c>] (warn_slowpath_null+0x1c/0x24)
[ 1125.730000] [<c001c63c>] (warn_slowpath_null+0x1c/0x24) from [<c0230858>] (inet_sock_destruct+0x158/0x1a8)
[ 1125.740000] [<c0230858>] (inet_sock_destruct+0x158/0x1a8) from [<c01d75f4>] (__sk_free+0x18/0x154)
[ 1125.750000] [<c01d75f4>] (__sk_free+0x18/0x154) from [<c0230aa0>] (inet_release+0x44/0x70)
[ 1125.760000] [<c0230aa0>] (inet_release+0x44/0x70) from [<c01d3714>] (sock_release+0x20/0xc8)
[ 1125.770000] [<c01d3714>] (sock_release+0x20/0xc8) from [<c01d37d0>] (sock_close+0x14/0x2c)
[ 1125.780000] [<c01d37d0>] (sock_close+0x14/0x2c) from [<c00a0044>] (fput+0xb4/0x27c)
[ 1125.780000] [<c00a0044>] (fput+0xb4/0x27c) from [<c009d64c>] (filp_close+0x64/0x88)
[ 1125.790000] [<c009d64c>] (filp_close+0x64/0x88) from [<c001fb28>] (put_files_struct+0x80/0xe0)
[ 1125.800000] [<c001fb28>] (put_files_struct+0x80/0xe0) from [<c0020388>] (do_exit+0x4c8/0x748)
[ 1125.810000] [<c0020388>] (do_exit+0x4c8/0x748) from [<c0011894>] (die+0x214/0x240)
[ 1125.820000] [<c0011894>] (die+0x214/0x240) from [<c0256160>] (__do_kernel_fault.part.0+0x54/0x74)
[ 1125.830000] [<c0256160>] (__do_kernel_fault.part.0+0x54/0x74) from [<c0015188>] (do_bad_area+0x88/0x8c)
[ 1125.840000] [<c0015188>] (do_bad_area+0x88/0x8c) from [<c00173dc>] (do_alignment+0xf0/0x938)
[ 1125.850000] [<c00173dc>] (do_alignment+0xf0/0x938) from [<c000862c>] (do_DataAbort+0x34/0x98)
[ 1125.850000] [<c000862c>] (do_DataAbort+0x34/0x98) from [<c000db98>] (__dabt_svc+0x38/0x60)
[ 1125.860000] Exception stack(0xc1e67cc8 to 0xc1e67d10)
[ 1125.870000] 7cc0: c1e67ec8 00000008 c1e67ec0 c14fe62e c14f4640 c1e67f7c
[ 1125.880000] 7ce0: c1e10220 000005c0 00000000 0000004a c1e67d34 0000004a 00000000 c1e67d10
[ 1125.880000] 7d00: 00000000 c0228adc a0000013 ffffffff
[ 1125.890000] [<c000db98>] (__dabt_svc+0x38/0x60) from [<c0228adc>] (udp_recvmsg+0x284/0x33c)
[ 1125.900000] [<c0228adc>] (udp_recvmsg+0x284/0x33c) from [<c02306e0>] (inet_recvmsg+0x38/0x4c)
[ 1125.910000] [<c02306e0>] (inet_recvmsg+0x38/0x4c) from [<c01d2c38>] (sock_recvmsg+0xa8/0xcc)
[ 1125.920000] [<c01d2c38>] (sock_recvmsg+0xa8/0xcc) from [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc)
[ 1125.930000] [<c01d3fac>] (___sys_recvmsg.part.4+0xe0/0x1bc) from [<c01d4fbc>] (__sys_recvmsg+0x50/0x80)
[ 1125.940000] [<c01d4fbc>] (__sys_recvmsg+0x50/0x80) from [<c000dfe0>] (ret_fast_syscall+0x0/0x2c)
[ 1125.940000] ---[ end trace f0b7642b1456208b ]---


[0]: http://git.lpclinux.com/?p=linux-2.6.39.2-lpc.git;a=summary
[1]: http://lists.openwall.net/netdev/2009/03/09/28
[2]: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/176757.html


Mit freundlichen Grüßen / With kind regards

Marcel Hellwig
B. Sc. Informatik
Entwickler

m-u-t GmbH
Am Marienhof 2
22880 Wedel
Germany

Phone: +49 4103 9308 - 474
Fax:   +49 4103 9308 - 99
mailto:mhellwig@xxxxxxxxxxxxx

http://www.mut-group.com

Geschäftsführer (Managing Director): Fabian Peters
Amtsgericht Pinneberg (Commercial Register No.): HRB 10304 PI
USt-IdNr. (VAT-No.): DE228275390
WEEE-Reg-Nr.: DE 72271808