"irq/matrix: Spread interrupts on allocation" breaks nouveau in mainline kernel

From: Lyude Paul
Date: Tue Jan 23 2018 - 17:01:21 EST


Hi! Sorry to be the bearer of bad news, but this patch actually seems to break
suspending and resuming with nouveau on my machine:

[ 29.694755] PM: suspend entry (deep)
[ 29.694773] PM: Syncing filesystems ... done.
[ 29.696203] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 29.697442] OOM killer disabled.
[ 29.697448] Freezing remaining freezable tasks ... (elapsed 0.000 seconds)
done.
[ 29.698232] Suspending console(s) (use no_console_suspend to debug)
[ 29.698993] serial 00:05: disabled
[ 29.708227] sd 4:0:0:0: [sda] Synchronizing SCSI cache
[ 29.708428] sd 4:0:0:0: [sda] Stopping disk
[ 30.614581] ACPI: Preparing to enter system sleep state S3
[ 30.917726] PM: Saving platform NVS memory
[ 30.917736] Disabling non-boot CPUs ...
[ 30.925616] smpboot: CPU 1 is now offline
[ 30.936915] smpboot: CPU 2 is now offline
[ 30.952824] smpboot: CPU 3 is now offline
[ 30.964764] smpboot: CPU 4 is now offline
[ 30.980663] smpboot: CPU 5 is now offline
[ 30.992692] smpboot: CPU 6 is now offline
[ 31.002572] smpboot: CPU 7 is now offline
[ 31.003130] ACPI: Low-level resume complete
[ 31.003180] PM: Restoring platform NVS memory
[ 31.003578] WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291
smp_call_function_single+0xdc/0xe0
[ 31.003578] Modules linked in: nouveau video mxm_wmi i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfat fat usbhid
crc32_pclmul i2c_piix4 i2c_core shpchp k10temp wmi acpi_cpufreq crc32c_intel
r8169 mii xhci_pci xhci_hcd w83627hf_wdt
[ 31.003590] CPU: 0 PID: 11523 Comm: rtcwake Not tainted 4.15.0-rc8nouveau-
clockgating+ #1
[ 31.003591] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.60
09/19/2017
[ 31.003592] RIP: 0010:smp_call_function_single+0xdc/0xe0
[ 31.003593] RSP: 0018:ffffc900004a3c40 EFLAGS: 00010046
[ 31.003594] RAX: 0000000000000000 RBX: ffffc900004a3cdc RCX: 0000000000000001
[ 31.003594] RDX: ffffc900004a3c98 RSI: ffffffff8137a180 RDI: 0000000000000000
[ 31.003595] RBP: ffffc900004a3c70 R08: 0000000000000001 R09: 0000000000010000
[ 31.003595] R10: ffffc900004a3c98 R11: 0000000000000000 R12: 0000000000000000
[ 31.003596] R13: 0000000001000000 R14: ffffc900004a3d0c R15: 0000000000000000
[ 31.003597] FS: 00007f03bee93540(0000) GS:ffff88021ae00000(0000)
knlGS:0000000000000000
[ 31.003597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.003598] CR2: 00007fffb6673008 CR3: 000000020ddd4000 CR4: 00000000003406f0
[ 31.003598] Call Trace:
[ 31.003603] ? rdmsr_safe_on_cpu+0x4b/0x70
[ 31.003604] rdmsr_safe_on_cpu+0x4b/0x70
[ 31.003606] get_block_address.isra.0+0x6e/0xe0
[ 31.003607] mce_amd_feature_init+0x63/0x2c0
[ 31.003609] mce_syscore_resume+0x1e/0x30
[ 31.003611] syscore_resume+0x4b/0x170
[ 31.003613] suspend_devices_and_enter+0x608/0x7e0
[ 31.003614] pm_suspend+0x315/0x380
[ 31.003615] state_store+0x7d/0xe0
[ 31.003618] kernfs_fop_write+0xfa/0x180
[ 31.003620] __vfs_write+0x23/0x130
[ 31.003623] ? SYSC_newfstat+0x29/0x40
[ 31.003625] ? _cond_resched+0x15/0x40
[ 31.003626] vfs_write+0xad/0x1a0
[ 31.003627] SyS_write+0x42/0x90
[ 31.003629] entry_SYSCALL_64_fastpath+0x24/0x87
[ 31.003630] RIP: 0033:0x7f03be9ae8f4
[ 31.003631] RSP: 002b:00007ffe6bf825f8 EFLAGS: 00000246
[ 31.003632] Code: fe ff ff 8b 55 e8 83 e2 01 74 0a f3 90 8b 55 e8 83 e2 01 75
f6 48 83 c4 28 41 5a 5d 49 8d 62 f8 c3 8b 05 58 b6 48 01 85 c0 75 86 <0f> ff eb
82 0f 1f 44 00 00 f6 46 18 01 75 15 c7 46 18 01 00 00
[ 31.003648] ---[ end trace 19fa2f7781ed5237 ]---
[ 31.004025] Enabling non-boot CPUs ...
[ 31.004052] x86: Booting SMP configuration:
[ 31.004052] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 31.006368] cache: parent cpu1 should not be sleeping
[ 31.006442] microcode: CPU1: patch_level=0x08001129
[ 31.006509] CPU1 is up
[ 31.006525] smpboot: Booting Node 0 Processor 2 APIC 0x2
[ 31.008832] cache: parent cpu2 should not be sleeping
[ 31.008894] microcode: CPU2: patch_level=0x08001129
[ 31.008966] CPU2 is up
[ 31.008975] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 31.011264] cache: parent cpu3 should not be sleeping
[ 31.011329] microcode: CPU3: patch_level=0x08001129
[ 31.011404] CPU3 is up
[ 31.011413] smpboot: Booting Node 0 Processor 4 APIC 0x8
[ 31.013833] cache: parent cpu4 should not be sleeping
[ 31.013903] microcode: CPU4: patch_level=0x08001129
[ 31.014025] CPU4 is up
[ 31.014036] smpboot: Booting Node 0 Processor 5 APIC 0x9
[ 31.016354] cache: parent cpu5 should not be sleeping
[ 31.016421] microcode: CPU5: patch_level=0x08001129
[ 31.016534] CPU5 is up
[ 31.016544] smpboot: Booting Node 0 Processor 6 APIC 0xa
[ 31.018857] cache: parent cpu6 should not be sleeping
[ 31.018930] microcode: CPU6: patch_level=0x08001129
[ 31.019047] CPU6 is up
[ 31.019057] smpboot: Booting Node 0 Processor 7 APIC 0xb
[ 31.021376] cache: parent cpu7 should not be sleeping
[ 31.021444] microcode: CPU7: patch_level=0x08001129
[ 31.021579] CPU7 is up
[ 31.022166] ACPI: Waking up from system sleep state S3
[ 31.070791] usb usb1: root hub lost power or was reset
[ 31.070794] usb usb2: root hub lost power or was reset
[ 31.071628] serial 00:05: activated
[ 31.080265] sd 4:0:0:0: [sda] Starting disk
[ 31.126099] hpet_rtc_timer_reinit: 68 callbacks suppressed
[ 31.126099] hpet1: lost 2 rtc interrupts
[ 31.160913] r8169 0000:1e:00.0 enp30s0: link down
[ 31.255563] do_IRQ: 1.35 No irq handler for vector
[ 31.379537] ata6: SATA link down (SStatus 0 SControl 300)
[ 31.379558] ata1: SATA link down (SStatus 0 SControl 300)
[ 31.380306] ata2: SATA link down (SStatus 0 SControl 300)
[ 31.435705] ata9: SATA link down (SStatus 0 SControl 300)
[ 31.589932] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 31.590320] ata5.00: configured for UDMA/133
[ 31.610043] usb 1-4: reset low-speed USB device number 2 using xhci_hcd
[ 32.226138] usb 1-5: reset low-speed USB device number 3 using xhci_hcd
[ 33.257867] nouveau 0000:22:00.0: DRM: EVO timeout
[ 34.237185] r8169 0000:1e:00.0 enp30s0: link up
[ 35.257880] nouveau 0000:22:00.0: DRM: base-0: timeout
[ 37.258334] nouveau 0000:22:00.0: DRM: base-0: timeout
[ 37.276084] OOM killer enabled.
[ 37.276612] Restarting tasks ... done.
[ 37.277722] PM: suspend exit

I haven't yet actually investigated why it does this, but a bisect of master led
me to here.