Re: kvm: use-after-free in process_srcu

From: Dmitry Vyukov
Date: Mon Jan 16 2017 - 16:34:59 EST


On Sun, Jan 15, 2017 at 6:11 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Fri, Jan 13, 2017 at 10:19 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>> On Fri, Jan 13, 2017 at 4:30 AM, Steve Rutherford
>> <srutherford@xxxxxxxxxx> wrote:
>>> I'm not that familiar with the kernel's workqueues, but this seems
>>> like the classic "callback outlives the memory it references"
>>> use-after-free, where the process_srcu callback is outliving struct
>>> kvm (which contains the srcu_struct). If that's right, then calling
>>> srcu_barrier (which should wait for all of the call_srcu callbacks to
>>> complete, which are what enqueue the process_srcu callbacks) before
>>> cleanup_srcu_struct in kvm_destroy_vm probably fixes this.
>>>
>>> The corresponding patch to virt/kvm/kvm_main.c looks something like:
>>> static void kvm_destroy_vm(struct kvm *kvm)
>>> {
>>> ...
>>> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
>>> kvm_free_memslots(kvm, kvm->memslots[i]);
>>> + srcu_barrier(&kvm->irq_srcu);
>>> cleanup_srcu_struct(&kvm->irq_srcu);
>>> + srcu_barrier(&kvm->srcu);
>>> cleanup_srcu_struct(&kvm->srcu);
>>> ...
>>>
>>>
>>> Since we don't have a repro, this obviously won't be readily testable.
>>> I find srcu subtle enough that I don't trust my reasoning fully (in
>>> particular, I don't trust that waiting for all of the call_srcu
>>> callbacks to complete also waits for all of the process_srcu
>>> callbacks). Someone else know if that's the case?
>>
>>
>> From the function description it looks like it should do the trick:
>>
>> 514 /**
>> 515 * srcu_barrier - Wait until all in-flight call_srcu() callbacks complete.
>> 516 * @sp: srcu_struct on which to wait for in-flight callbacks.
>> 517 */
>> 518 void srcu_barrier(struct srcu_struct *sp)
>>
>> I see this failure happening several times per day. I've applied your
>> patch locally and will check if I see these failures happening.
>
>
> I have not seen the crash in 3 days, when usually I see several
> crashes per night. So I think we can consider that the patch fixes the
> crash:
>
> Tested-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>


Unfortunately I hit it again with the patch applied. It definitely
happens less frequently now, but still happens:


BUG: KASAN: use-after-free in rcu_batch_empty kernel/rcu/srcu.c:64
[inline] at addr ffff88006ada0218
BUG: KASAN: use-after-free in rcu_batch_dequeue kernel/rcu/srcu.c:75
[inline] at addr ffff88006ada0218
BUG: KASAN: use-after-free in srcu_invoke_callbacks
kernel/rcu/srcu.c:624 [inline] at addr ffff88006ada0218
BUG: KASAN: use-after-free in process_srcu+0x275/0x280
kernel/rcu/srcu.c:672 at addr ffff88006ada0218
Read of size 8 by task kworker/2:2/1073
CPU: 2 PID: 1073 Comm: kworker/2:2 Not tainted 4.10.0-rc3+ #164
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events_power_efficient process_srcu
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
kasan_object_err+0x1c/0x70 mm/kasan/report.c:165
print_address_description mm/kasan/report.c:203 [inline]
kasan_report_error mm/kasan/report.c:287 [inline]
kasan_report+0x1b6/0x460 mm/kasan/report.c:307
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333
rcu_batch_empty kernel/rcu/srcu.c:64 [inline]
rcu_batch_dequeue kernel/rcu/srcu.c:75 [inline]
srcu_invoke_callbacks kernel/rcu/srcu.c:624 [inline]
process_srcu+0x275/0x280 kernel/rcu/srcu.c:672
process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2098
worker_thread+0x223/0x1990 kernel/workqueue.c:2232
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:427
Object at ffff88006ada00c0, in cache kmalloc-16384 size: 16384
Allocated:
PID = 32313
[<ffffffff812b2686>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
[<ffffffff81a0e3d3>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502
[<ffffffff81a0e69a>] set_track mm/kasan/kasan.c:514 [inline]
[<ffffffff81a0e69a>] kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:605
[<ffffffff81a0aeeb>] kmem_cache_alloc_trace+0x10b/0x670 mm/slab.c:3629
[<ffffffff8107910e>] kvm_arch_alloc_vm include/linux/slab.h:490 [inline]
[<ffffffff8107910e>] kvm_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:613 [inline]
[<ffffffff8107910e>] kvm_dev_ioctl_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:3175 [inline]
[<ffffffff8107910e>] kvm_dev_ioctl+0x1be/0x11b0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:3219
[<ffffffff81aa55bf>] vfs_ioctl fs/ioctl.c:43 [inline]
[<ffffffff81aa55bf>] do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
[<ffffffff81aa6c1f>] SYSC_ioctl fs/ioctl.c:698 [inline]
[<ffffffff81aa6c1f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
[<ffffffff841c9941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
Freed:
PID = 32310
[<ffffffff812b2686>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
[<ffffffff81a0e3d3>] save_stack+0x43/0xd0 mm/kasan/kasan.c:502
[<ffffffff81a0ed0f>] set_track mm/kasan/kasan.c:514 [inline]
[<ffffffff81a0ed0f>] kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:578
[<ffffffff81a0cbc3>] __cache_free mm/slab.c:3505 [inline]
[<ffffffff81a0cbc3>] kfree+0xd3/0x250 mm/slab.c:3822
[<ffffffff81070579>] kvm_arch_free_vm include/linux/kvm_host.h:781 [inline]
[<ffffffff81070579>] kvm_destroy_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:744 [inline]
[<ffffffff81070579>] kvm_put_kvm+0x719/0x9a0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:759
[<ffffffff81070952>] kvm_vm_release+0x42/0x50
arch/x86/kvm/../../../virt/kvm/kvm_main.c:770
[<ffffffff81a5f872>] __fput+0x332/0x7f0 fs/file_table.c:208
[<ffffffff81a5fdb5>] ____fput+0x15/0x20 fs/file_table.c:244
[<ffffffff814ca25a>] task_work_run+0x18a/0x260 kernel/task_work.c:116
[<ffffffff81454a57>] exit_task_work include/linux/task_work.h:21 [inline]
[<ffffffff81454a57>] do_exit+0x18e7/0x28a0 kernel/exit.c:839
[<ffffffff8145a469>] do_group_exit+0x149/0x420 kernel/exit.c:943
[<ffffffff81489730>] get_signal+0x7e0/0x1820 kernel/signal.c:2313
[<ffffffff8127ca92>] do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:807
[<ffffffff81007900>] exit_to_usermode_loop+0x200/0x2a0
arch/x86/entry/common.c:156
[<ffffffff81009413>] prepare_exit_to_usermode
arch/x86/entry/common.c:190 [inline]
[<ffffffff81009413>] syscall_return_slowpath+0x4d3/0x570
arch/x86/entry/common.c:259
[<ffffffff841c99e2>] entry_SYSCALL_64_fastpath+0xc0/0xc2
Memory state around the buggy address:
ffff88006ada0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88006ada0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88006ada0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88006ada0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88006ada0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
Kernel panic - not syncing: panic_on_warn set ...

CPU: 2 PID: 1073 Comm: kworker/2:2 Tainted: G B 4.10.0-rc3+ #164
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events_power_efficient process_srcu
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
panic+0x1fb/0x412 kernel/panic.c:179
kasan_end_report+0x5b/0x60 mm/kasan/report.c:144
kasan_report_error mm/kasan/report.c:291 [inline]
kasan_report+0x386/0x460 mm/kasan/report.c:307
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333
rcu_batch_empty kernel/rcu/srcu.c:64 [inline]
rcu_batch_dequeue kernel/rcu/srcu.c:75 [inline]
srcu_invoke_callbacks kernel/rcu/srcu.c:624 [inline]
process_srcu+0x275/0x280 kernel/rcu/srcu.c:672
process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2098
worker_thread+0x223/0x1990 kernel/workqueue.c:2232
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:427
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled