Re: How to debug these general protection fault: 0000 [#1] SMP KASAN PTI issues ?

From: Dmitry Vyukov
Date: Mon Apr 29 2019 - 10:24:31 EST

Next message: Greg KH: "Re: [PATCH v7 resend 1/2] Provide in-kernel headers to make extending kernel easier"
Previous message: Joerg Roedel: "Re: "iommu/amd: Set exclusion range correctly" causes smartpqi offline"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Apr 26, 2019 at 8:40 PM JohnD Oracle <johndonoracle@xxxxxxxxx> wrote:
>
> Hi
>
>
> I am seeing a number of miss leading information in these reports , and I don't have an adequate understanding how KASAN
> works in order to know to debug it.
>
> For instance;
>
> Lets look at this event :
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN PTI
>
> CPU: 0 PID: 2823 Comm: test2 Not tainted 4.14.35.jpd-ksan.01.-syzkaller #22
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> task: ffff88805ac89780 task.stack: ffff888054920000
> RIP: 0010:vhost_vsock_dev_release+0x10f/0x450 [vhost_vsock]
> RSP: 0018:ffff888054927be8 EFLAGS: 00010206
> RAX: dffffc0000000000 RBX: 727574616e676973 RCX: 1ffff1100a924f76
> RDX: 0e4eae8c2dcced2e RSI: ffff88805a4a0500 RDI: ffff88807e5e8bb8
> RBP: ffff888054927c38 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffed100a924f3d R11: ffff8880549279ef R12: ffff88807e5e0000
> R13: 00686769685f7265 R14: ffff88807e5e8bc0 R15: ffffffffc04e2a30
> FS: 0000000000000000(0000) GS:ffff88805e400000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe5af16c6c8 CR3: 0000000003a0e000 CR4: 00000000000006f0
> Call Trace:
> ? ima_file_free+0xb6/0x316
> ? vhost_vsock_dev_open+0x2d0/0x2d0 [vhost_vsock]
> __fput+0x25d/0x775
> ____fput+0x1a/0x1d
> task_work_run+0x12e/0x18f
> do_exit+0x6ee/0x2a5e
>
> Issue number 1 :
>
> The static trace shows vhost_vsock_dev_open() ; but in reality , we are in a system EXIT call closing open file descriptors because the RIP is:
>
> RIP: 0010:vhost_vsock_dev_release+0x10f
>
> Ok ; So the stack is dirty with old information, but using GDB, I set a breakpoint at ksan die exception handler that generates the kernel trace message
> that include vhost_vsock_dev_open() :
>
>
> #0 kasan_die_handler (self=0xffffffff83aa8420 <kasan_die_notifier>, val=0x9, data=0xffff888054927a60) at arch/x86/mm/kasan_init_64.c:245
> #1 0xffffffff8120363e in notifier_call_chain (nl=<optimized out>, val=<optimized out>, v=<optimized out>, nr_to_call=0xfffffffb,
> nr_calls=0x0 <irq_stack_union>) at kernel/notifier.c:93
> #2 0xffffffff81203d2b in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>,
> val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:183
> #3 atomic_notifier_call_chain (v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:193
> #4 notify_die (val=<optimized out>, str=<optimized out>, regs=<optimized out>, err=0x0, trap=0xd, sig=0xb) at kernel/notifier.c:549
> #5 0xffffffff8108997f in do_general_protection (regs=0xffff888054927b38, error_code=0x0) at arch/x86/kernel/traps.c:558
> #6 0xffffffff82e037bc in general_protection () at arch/x86/entry/entry_64.S:1275
> #7 0xffffffffc04e2a30 in vhost_vsock_dev_open (inode=<optimized out>, file=0x0 <irq_stack_union>) at drivers/vhost/vsock.c:526
>
> What am I suppose to believe ? Are we calling vsock_open() from the exit or vsock_dev_release() ?
>
>
> What really gets confusing is what the dis-assembling shows around the gdb exception "
>
> #7 0xffffffffc04e2a30 in vhost_vsock_dev_open (inode=<optimized out>, file=0x0 <irq_stack_union>) at drivers/vhost/vsock.c:526
>
>
> Lets look at failing instructions around at vsock_open() 0xffffffffc04e2a30 :
>
> 0xffffffffc04e2a19 <vhost_vsock_dev_open+697>: callq 0xffffffff816a94c0 <__asan_report_store8_noabort>
> 0xffffffffc04e2a1e <vhost_vsock_dev_open+702>: jmpq 0xffffffffc04e2848 <vhost_vsock_dev_open+232>
> 0xffffffffc04e2a23 <vhost_vsock_dev_open+707>: mov %r12,%rdi
> 0xffffffffc04e2a26 <vhost_vsock_dev_open+710>: callq 0xffffffff816a94c0 <__asan_report_store8_noabort>
> 0xffffffffc04e2a2b <vhost_vsock_dev_open+715>: jmpq 0xffffffffc04e286f <vhost_vsock_dev_open+271>
> 0xffffffffc04e2a30 <vhost_vsock_dev_release>: nopl 0x0(%rax,%rax,1) <<<<<<<<<< FAILING ADDRESS
>
> How can I be getting a fault on a 5 byte NOP instruction : nopl 0x0(%rax,%rax,1)
>
> ax is 0xdffffc0000000000; so it should be a move register ax to ax.
>
> I have no idea what the inserted functions : asan_report_store8_noabort() are; They don't appear in the kernel source that I can find.
>
> Lets look at the RIP from ksan die() :
>
>
> 353 void die(const char *str, struct pt_regs *regs, long err)
> 354 {
> 355 unsigned long flags = oops_begin();
> 356 int sig = SIGSEGV;
> 357
> 358 if (__die(str, regs, err))
> (gdb) p *regs
> $1 = {
> r15 = 0xffffffffc04e2a30,
> r14 = 0xffff888053a38bc0,
> r13 = 0x483a750000002825,
> r12 = 0xffff888053a30000,
> bp = 0xffff888056347c38,
> bx = 0x415c415d5b48c483,
> r11 = 0xffff8880563479ef,
> r10 = 0xffffed100ac68f3d,
> r9 = 0x0,
> r8 = 0x0,
> ax = 0xdffffc0000000000,
> cx = 0x1ffff1100ac68f76,
> dx = 0x82b882bab691890,
> si = 0xffff88807eef5b40,
> di = 0xffff888053a38bb8,
> orig_ax = 0xffffffffffffffff,
> ip = 0xffffffffc04e2b3f,
> cs = 0x10,
> flags = 0x10206,
> sp = 0xffff888056347be8,
> ss = 0x18
> }
>
> the regs.ip == 0xffffffffc04e2b3f
> 0xffffffffc04e2b3f <vhost_vsock_dev_release+271>: cmpb $0x0,(%rdx,%rax,1)
> 0xffffffffc04e2b43 <vhost_vsock_dev_release+275>: jne 0xffffffffc04e2e50 <vhost_vsock_dev_release+1056>
> (gdb)
>
>
>
> (gdb) x/2i 0xffffffffc04e2b3f
> 0xffffffffc04e2b3f <vhost_vsock_dev_release+271>: cmpb $0x0,(%rdx,%rax,1)
> 0xffffffffc04e2b43 <vhost_vsock_dev_release+275>: jne 0xffffffffc04e2e50 <vhost_vsock_dev_release+1056>
>
>
> I can believe getting a fault doing a cmpb (dx,ax )
>
> Is this a complex instruction comparing *dx to *ax ? the the register contents ?
>
>
> ax =
>
> (gdb) x/2b 0xdffffc0000000000
> 0xdffffc0000000000: Cannot access memory at address 0xdffffc0000000000
>
> dx =
>
> (gdb) x/2b 0x82b882bab691890
> 0x82b882bab691890: Cannot access memory at address 0x82b882bab691890
>
>
> Debugging suggestions welcome !
>
> JD
>
> ===
>
>
> Attached is the test case.
your info,

Hi JohnD,

Are you debugging some syzbot-reported bug? Which one? It's useful to
keep this in the same email thread as the report in somebody else will
look at it later (or maybe debugging the same at the same time).
syzkaller-bugs@ is generally not read by anyone and is only CCed in
syzbot reports. We should direct people to syzkaller@ mailing list
everywhere.

Looking at the info, it seems that the crash happens in
vhost_vsock_dev_release and gdb improperly unwinds kernel (maybe you
need a latest gdb or something).
__fput generally calls callbacks that close/release/destroy something.
And it seems that vhost_vsock_dev_release perfectly matches this
definition.
Also "cmpb $0x0,(%rdx,%rax,1)" looks like KASAN shadow check and is
the instruction where NULL derefs usually caught.
This instruction does "if (*(byte*)(rax + rdx) == 0)". Perhaps you
better turn off CONFIG_KASAN to make debugging simpler. Since it's not
KASAN-detected crash, there is no point enabling KASAN.
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

static bool write_file(const char * file,
const char * what, ...) {
char buf[1024];
va_list args;
va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);
if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;
close(fd);
errno = err;
return false;
}
close(fd);
return true;
}

static void setup_common() {
if (mount(0, "/sys/fs/fuse/connections", "fusectl", 0, 0)) {}
}

static void loop();

static void sandbox_common() {
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
setsid();
struct rlimit rlim;
rlim.rlim_cur = rlim.rlim_max = (200 << 20);
setrlimit(RLIMIT_AS, & rlim);
rlim.rlim_cur = rlim.rlim_max = 32 << 20;
setrlimit(RLIMIT_MEMLOCK, & rlim);
rlim.rlim_cur = rlim.rlim_max = 136 << 20;
setrlimit(RLIMIT_FSIZE, & rlim);
rlim.rlim_cur = rlim.rlim_max = 1 << 20;
setrlimit(RLIMIT_STACK, & rlim);
rlim.rlim_cur = rlim.rlim_max = 0;
setrlimit(RLIMIT_CORE, & rlim);
rlim.rlim_cur = rlim.rlim_max = 256;
setrlimit(RLIMIT_NOFILE, & rlim);
if (unshare(CLONE_NEWNS)) {}
if (unshare(CLONE_NEWIPC)) {}
if (unshare(0x02000000)) {}
if (unshare(CLONE_NEWUTS)) {}
if (unshare(CLONE_SYSVSEM)) {}
typedef struct {
const char * name;
const char * value;
}
sysctl_t;
static
const sysctl_t sysctls[] = {
{
"/proc/sys/kernel/shmmax",
"16777216"
},
{
"/proc/sys/kernel/shmall",
"536870912"
},
{
"/proc/sys/kernel/shmmni",
"1024"
},
{
"/proc/sys/kernel/msgmax",
"8192"
},
{
"/proc/sys/kernel/msgmni",
"1024"
},
{
"/proc/sys/kernel/msgmnb",
"1024"
},
{
"/proc/sys/kernel/sem",
"1024 1048576 500 1024"
},
};
unsigned i;
return;
for (i = 0; i < sizeof(sysctls) / sizeof(sysctls[0]); i++)
write_file(sysctls[i].name, sysctls[i].value);
}

int wait_for_loop(int pid) {
if (pid < 0)
exit(1);
int status = 0;
while (waitpid(-1, & status, __WALL) != pid) {}
printf("child done\n");

return WEXITSTATUS(status);
}

static int do_sandbox_none(void) {
//if (unshare(CLONE_NEWPID)) {}
int pid = fork();
if (pid != 0)
return wait_for_loop(pid);
//pause();
// child continues here

setup_common();
sandbox_common();
// if (unshare(CLONE_NEWNET)) {}
loop();
printf("child exiting \n");
exit(1);
}

void loop(void) {
memcpy((void * ) 0x20000000, "/dev/vhost-vsock\000", 17);
syscall(__NR_openat, 0xffffffffffffff9c, 0x20000000, 2, 0);
syscall(__NR_dup2, -1, -1);

}
int main(void) {
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
do_sandbox_none();
return 0;
}

Next message: Greg KH: "Re: [PATCH v7 resend 1/2] Provide in-kernel headers to make extending kernel easier"
Previous message: Joerg Roedel: "Re: "iommu/amd: Set exclusion range correctly" causes smartpqi offline"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]