Re: general protection fault in kvm_vm_ioctl_unregister_coalesced_mmio

From: Sean Christopherson
Date: Mon Apr 12 2021 - 12:50:45 EST


On Mon, Apr 12, 2021, Hao Sun wrote:
> Crash log:
> ==============================================
> kvm: failed to shrink bus, removing it completely
> general protection fault, probably for non-canonical address
> 0xdead000000000100: 0000 [#1] PREEMPT SMP
> CPU: 3 PID: 7974 Comm: executor Not tainted 5.12.0-rc6+ #14
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> RIP: 0010:kvm_vm_ioctl_unregister_coalesced_mmio+0x88/0x1e0
> arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:183

Ugh, this code is a mess. On allocation failure, it nukes the entire bus and
invokes the destructor for all _other_ devices on the bus. The coalesced MMIO
code is iterating over its list of devices, and while list_for_each_entry_safe()
can handle removal of the current entry, it blows up when future entries are
deleted.

That the coalesced MMIO code continuing to iterate appears to stem from the fact
that KVM_UNREGISTER_COALESCED_MMIO doesn't require an exact match. Whether or
not this is intentional is probably a moot point since it's now baked into the
ABI.

Assuming we can't change kvm_vm_ioctl_unregister_coalesced_mmio() to stop
iterating on match, the least awful fix would be to return success/failure from
kvm_io_bus_unregister_dev().

Note, there's a second bug in the error path in kvm_io_bus_unregister_dev(), as
it invokes the destructors before nullifying kvm->buses and synchronizing SRCU.
I.e. it's freeing devices on the bus while readers may be in flight. That can
be fixed by deferring the destruction until after SRCU synchronization.

I'll send patches unless someone has a better idea for fixing this.

> Code: 00 4c 89 74 24 18 4c 89 6c 24 20 48 8b 44 24 10 48 83 c0 08 48
> 89 44 24 28 48 89 5c 24 08 4c 89 24 24 4c 89 ff e8 d8 9f 49 00 <4d> 8b
> 37 48 89 df e8 3d 9b 49 00 8b 2b 49 8d 7f 2c e8 32 9b 49 00
> RSP: 0018:ffffc90005dfbd58 EFLAGS: 00010246
> RAX: ffff88800c3e7188 RBX: ffffc90005dfbe3c RCX: 0000000000000af0
> RDX: 0001000000000100 RSI: 000000000000cbab RDI: dead000000000100
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0001000000000107
> R10: 0001ffffffffffff R11: 00000000000001d2 R12: ffffc90005e7dff8
> R13: 0000000000004000 R14: dead000000000100 R15: dead000000000100
> FS: 00007ff1bb092700(0000) GS:ffff88807ed00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055d8946c4918 CR3: 0000000012d88000 CR4: 0000000000752ee0
> PKRU: 55555554
> Call Trace:
> kvm_vm_ioctl+0x6e1/0x1860 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3897
> vfs_ioctl fs/ioctl.c:48 [inline]
> __do_sys_ioctl fs/ioctl.c:753 [inline]
> __se_sys_ioctl+0xab/0x110 fs/ioctl.c:739
> __x64_sys_ioctl+0x3f/0x50 fs/ioctl.c:739
> do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x47338d