Re: Linux-next: Kernel panic - not syncing: Fatal exception in interrupt - RIP: 0010:security_port_sid

From: Paul Moore
Date: Wed Aug 19 2020 - 09:29:20 EST


On Wed, Aug 19, 2020 at 9:16 AM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
> On 8/19/20 9:12 AM, Paul Moore wrote:
>
> > On Wed, Aug 19, 2020 at 8:28 AM Stephen Smalley
> > <stephen.smalley.work@xxxxxxxxx> wrote:
> >> On 8/19/20 6:11 AM, Naresh Kamboju wrote:
> >>> Kernel panic noticed on linux next 20200819 tag on x86_64 and i386.
> >>>
> >>> Kernel panic - not syncing: Fatal exception in interrupt
> >>>
> >>> metadata:
> >>> git branch: master
> >>> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> >>> git commit: 8eb858df0a5f6bcd371b5d5637255c987278b8c9
> >>> git describe: next-20200819
> >>> make_kernelversion: 5.9.0-rc1
> >>> kernel-config:
> >>> https://builds.tuxbuild.com/izEMrcIH10iI6m0FU7O0LA/kernel.config
> >>>
> >>> crash log:
> >>> [ 3.704578] BUG: kernel NULL pointer dereference, address: 00000000000001c8
> >>> [ 3.704865] #PF: supervisor read access in kernel mode
> >>> [ 3.704865] #PF: error_code(0x0000) - not-present page
> >>> [ 3.704865] PGD 0 P4D 0
> >>> [ 3.704865] Oops: 0000 [#1] SMP NOPTI
> >>> [ 3.704865] CPU: 0 PID: 1 Comm: systemd Not tainted
> >>> 5.9.0-rc1-next-20200819 #1
> >>> [ 3.704865] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >>> BIOS 1.12.0-1 04/01/2014
> >>> [ 3.704865] RIP: 0010:security_port_sid+0x2f/0xb0
> >>> [ 3.704865] Code: 55 48 89 e5 41 57 49 89 ff 41 56 49 89 ce 41 55
> >>> 41 89 d5 41 54 41 89 f4 53 48 8b 7f 40 e8 c9 ca 94 00 49 8b 47 40 48
> >>> 8b 40 10 <48> 8b 98 c8 01 00 00 48 85 db 75 0e eb 65 48 8b 9b c0 00 00
> >>> 00 48
> >>> [ 3.704865] RSP: 0018:ffffb607c0013d00 EFLAGS: 00010246
> >>> [ 3.704865] RAX: 0000000000000000 RBX: ffffffffaef076f8 RCX: ffffb607c0013d9c
> >>> [ 3.704865] RDX: 0000000000000016 RSI: 0000000000000006 RDI: ffffffffaef08d10
> >>> [ 3.704865] RBP: ffffb607c0013d28 R08: 0000000000000218 R09: 0000000000000016
> >>> [ 3.704865] R10: ffffb607c0013d9c R11: ffff988ff9665260 R12: 0000000000000006
> >>> [ 3.704865] R13: 0000000000000016 R14: ffffb607c0013d9c R15: ffffffffaef05820
> >>> [ 3.721157] FS: 00007f5ef4fec840(0000) GS:ffff988ffbc00000(0000)
> >>> knlGS:0000000000000000
> >>> [ 3.721157] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 3.721157] CR2: 00000000000001c8 CR3: 000000013b04c000 CR4: 00000000003506f0
> >>> [ 3.721157] Call Trace:
> >>> [ 3.721157] sel_netport_sid+0x120/0x1e0
> >>> [ 3.721157] selinux_socket_bind+0x15a/0x250
> >>> [ 3.721157] ? _raw_spin_trylock_bh+0x42/0x50
> >>> [ 3.721157] ? __local_bh_enable_ip+0x46/0x70
> >>> [ 3.721157] ? _raw_spin_unlock_bh+0x1a/0x20
> >>> [ 3.721157] security_socket_bind+0x35/0x50
> >>> [ 3.721157] __sys_bind+0xcf/0x110
> >>> [ 3.721157] ? syscall_enter_from_user_mode+0x1f/0x1f0
> >>> [ 3.730888] ? do_syscall_64+0x14/0x50
> >>> [ 3.730888] ? trace_hardirqs_on+0x38/0xf0
> >>> [ 3.732120] __x64_sys_bind+0x1a/0x20
> >>> [ 3.732120] do_syscall_64+0x38/0x50
> >>> [ 3.732120] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [ 3.732120] RIP: 0033:0x7f5ef37f3057
> >>> [ 3.732120] Code: ff ff ff ff c3 48 8b 15 3f 9e 2b 00 f7 d8 64 89
> >>> 02 b8 ff ff ff ff eb ba 66 2e 0f 1f 84 00 00 00 00 00 90 b8 31 00 00
> >>> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 11 9e 2b 00 f7 d8 64 89
> >>> 01 48
> >>> [ 3.738888] RSP: 002b:00007ffe638fbbb8 EFLAGS: 00000246 ORIG_RAX:
> >>> 0000000000000031
> >>> [ 3.738888] RAX: ffffffffffffffda RBX: 000055833cf9ef80 RCX: 00007f5ef37f3057
> >>> [ 3.738888] RDX: 000000000000001c RSI: 000055833cf9ef80 RDI: 000000000000002b
> >>> [ 3.743930] virtio_net virtio0 enp0s3: renamed from eth0
> >>> [ 3.738888] RBP: 000000000000002b R08: 0000000000000004 R09: 0000000000000000
> >>> [ 3.738888] R10: 00007ffe638fbbe4 R11: 0000000000000246 R12: 0000000000000000
> >>> [ 3.744849] R13: 00007ffe638fbbe4 R14: 0000000000000000 R15:
> >>> 000000RIP: 0010:security_port_sid0000000000
> >>> [ 3.744849] Modules linked in:
> >>> [ 3.744849] CR2: 00000000000001c8
> >>> [ 3.744849] ---[ end trace 485eaaecdce54971 ]---
> >>> [ 3.744849] RIP: 0010:security_port_sid+0x2f/0xb0
> >>> [ 3.744849] Code: 55 48 89 e5 41 57 49 89 ff 41 56 49 89 ce 41 55
> >>> 41 89 d5 41 54 41 89 f4 53 48 8b 7f 40 e8 c9 ca 94 00 49 8b 47 40 48
> >>> 8b 40 10 <48> 8b 98 c8 01 00 00 48 85 db 75 0e eb 65 48 8b 9b c0 00 00
> >>> 00 48
> >>> [ 3.744849] RSP: 0018:ffffb607c0013d00 EFLAGS: 00010246
> >>> [ 3.744849] RAX: 0000000000000000 RBX: ffffffffaef076f8 RCX: ffffb607c0013d9c
> >>> [ 3.744849] RDX: 0000000000000016 RSI: 0000000000000006 RDI: ffffffffaef08d10
> >>> [ 3.744849] RBP: ffffb607c0013d28 R08: 0000000000000218 R09: 0000000000000016
> >>> [ 3.744849] R10: ffffb607c0013d9c R11: ffff988ff9665260 R12: 0000000000000006
> >>> [ 3.744849] R13: 0000000000000016 R14: ffffb607c0013d9c R15: ffffffffaef05820
> >>> [ 3.744849] FS: 00007f5ef4fec840(0000) GS:ffff988ffbc00000(0000)
> >>> knlGS:0000000000000000
> >>> [ 3.744849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 3.744849] CR2: 00000000000001c8 CR3: 000000013b04c000 CR4: 00000000003506f0
> >>> [ 3.7RIP: 0010:security_port_sid44849] Kernel panic - not syncing:
> >>> Fatal exception in interrupt
> >>> [ 3.744849] Kernel Offset: 0x2c000000 from 0xffffffff81000000
> >>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >>> [ 3.744849] ---[ end Kernel panic - not syncing: Fatal exception in
> >>> interrupt ]---
> >>>
> >>> full test log link,
> >>> https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20200819/testrun/3084905/suite/linux-log-parser/test/check-kernel-panic-1682816/log
> >>>
> >>> Reported-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>
> >> Thank you for the report. It appears from the log that you are enabling
> >> SELinux but not loading any policy? If that is correct, then I believe
> >> I know the underlying cause and can create a patch.
> > Yes, I'm guessing the bind() hook is the culprit.
> >
> > I'm beginning to think we should try forcing a run of the
> > selinux-testsuite on a system with SELinux enabled but without a
> > loaded policy. The test suite will fail in spectacular fashion, but
> > it will be a good way to shake out some of these corner cases.
>
> It's due to the lack of explicit selinux_initialized(state) guards in
> security_port_sid() and the rest of those functions. Previously, they
> happened to work because the policydb was statically allocated and could
> be accessed even before initial policy load. With the encapsulation of
> the policy state and dynamic allocation, they need to check
> selinux_initialized() first and return immediately if it isn't 1. I
> have a patch in the works.

Right. I was just saying that I was pretty sure the code path came in
via bind() ... which is obvious since it is in the backtrace and I
missed that since I only looked at the location of the panic and
worked the code path backwards looking for the initialization check :)

> With respect to testing, even just doing a
> simple boot test with SELinux enabled but no policy would have detected
> this one; it just isn't part of my usual workflow.

Which is fair as it isn't a use case that is really valid, but we've
seen it pop up a few times now with everyone automating their testing
without understanding how to use/test SELinux properly. My thinking
behind running the test suite w/o a policy is to try and catch all
these cases where we aren't doing an initialization check before
querying any of the policy data; I know we squashed a bunch of these,
but I'm not convinced we caught them all (and of course we can always
introduce new bugs).

--
paul moore
www.paul-moore.com