5.3-rc3: Frozen graphics with kcompactd migrating i915 pages

From: Martin Wilck
Date: Fri Aug 09 2019 - 08:44:13 EST


This happened to me today, running kernel 5.3.0-rc3-1.g571863b-default
(5.3-rc3 with just a few patches on top), after starting a KVM virtual
machine. The X screen was frozen. Remote login via ssh was still
possible, thus I was able to retrieve basic logs.

sysrq-w showed two blocked processes (kcompactd0 and KVM). After a
minute, the same two processes were still blocked. KVM seems to try to
acquire a lock that kcompactd is holding. kcompactd is waiting for IO
to complete on pages owned by the i915 driver.

kcompactd stack:

Aug 09 12:12:48 apollon.suse.de kernel: sysrq: Show Blocked State
Aug 09 12:12:48 apollon.suse.de
kernel: task PC stack pid father
Aug 09 12:12:48 apollon.suse.de kernel:
kcompactd0 D 0 43 2 0x80004000
Aug 09 12:12:48 apollon.suse.de kernel: Call Trace:
Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0
Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90
Aug 09 12:12:48 apollon.suse.de kernel: io_schedule+0x12/0x40
Aug 09 12:12:48 apollon.suse.de kernel: __lock_page+0x123/0x200
Aug 09 12:12:48 apollon.suse.de kernel: ?
gen8_ppgtt_clear_pdp+0xc0/0x140 [i915]
Aug 09 12:12:48 apollon.suse.de kernel: ?
file_fdatawait_range+0x20/0x20
Aug 09 12:12:48 apollon.suse.de kernel: set_page_dirty_lock+0x49/0x50
Aug 09 12:12:48 apollon.suse.de
kernel: i915_gem_userptr_put_pages+0x13f/0x1c0 [i915]
Aug 09 12:12:48 apollon.suse.de
kernel: __i915_gem_object_put_pages+0x5e/0xa0 [i915]
Aug 09 12:12:48 apollon.suse.de
kernel: userptr_mn_invalidate_range_start+0x1ff/0x220 [i915]
Aug 09 12:12:48 apollon.suse.de
kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0
Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0
Aug 09 12:12:48 apollon.suse.de kernel: ? __mod_lruvec_state+0x3f/0xf0
Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250
Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0
Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290
Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20
Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80
Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0
Aug 09 12:12:48 apollon.suse.de kernel: ?
fast_isolate_freepages+0x6b0/0x6b0
Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0
Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80
Aug 09 12:12:48 apollon.suse.de kernel: ?
entry_SYSCALL_64_after_hwframe+0xb8/0xbe
Aug 09 12:12:48 apollon.suse.de kernel: kcompactd_do_work+0x120/0x290


KVM stack:

Aug 09 12:12:48 apollon.suse.de kernel: CPU 0/KVM D 0
25189 1 0x00000320
Aug 09 12:12:48 apollon.suse.de kernel: Call Trace:
Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0
Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90
Aug 09 12:12:48 apollon.suse.de
kernel: schedule_preempt_disabled+0xa/0x10
Aug 09 12:12:48 apollon.suse.de
kernel: __mutex_lock.isra.0+0x172/0x4d0
Aug 09 12:12:48 apollon.suse.de
kernel: userptr_mn_invalidate_range_start+0x1bf/0x220 [i915]
Aug 09 12:12:48 apollon.suse.de
kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0
Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0
Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250
Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0
Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290
Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20
Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80
Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0
Aug 09 12:12:48 apollon.suse.de kernel: ?
fast_isolate_freepages+0x6b0/0x6b0
Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0
Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80
Aug 09 12:12:48 apollon.suse.de kernel: compact_zone_order+0xc6/0xf0
Aug 09 12:12:48 apollon.suse.de
kernel: try_to_compact_pages+0xcc/0x2a0
Aug 09 12:12:48 apollon.suse.de
kernel: __alloc_pages_direct_compact+0x7c/0x150
Aug 09 12:12:48 apollon.suse.de
kernel: __alloc_pages_slowpath+0x1ee/0xd00
Aug 09 12:12:48 apollon.suse.de kernel: ? vmx_vcpu_load+0x100/0x120
[kvm_intel]

Full logs can be found under https://pastebin.com/KJ6tccj4
I haven't yet tried if this is reproducible.

Regards
Martin