Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

From: Thomas Zimmermann
Date: Thu Jul 07 2022 - 06:10:38 EST


Hi

Am 07.07.22 um 02:20 schrieb Mikhail Gavrilov:
On Tue, Jun 28, 2022 at 2:21 PM Mikhail Gavrilov
<mikhail.v.gavrilov@xxxxxxxxx> wrote:


Christian can you look why
drm_aperture_remove_conflicting_pci_framebuffers cause this kernel bug
on my machine?

Thanks for reporting. This bug has been fixed in


https://cgit.freedesktop.org/drm/drm/commit/?h=drm-fixes&id=ee7a69aa38d87a3bbced7b8245c732c05ed0c6ec

The patch should reach mainline next week or so.

Best regards
Thomas


[ 6.822385] amdgpu: Ignoring ACPI CRAT on non-APU system
[ 6.822462] amdgpu: Virtual CRAT table created for CPU
[ 6.822654] amdgpu: Topology: Add CPU node
[ 6.827643] Console: switching to colour dummy device 80x25
[ 6.845504] BUG: kernel NULL pointer dereference, address: 0000000000000038
[ 6.845509] #PF: supervisor read access in kernel mode
[ 6.845512] #PF: error_code(0x0000) - not-present page
[ 6.845515] PGD 0 P4D 0
[ 6.845518] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 6.845522] CPU: 27 PID: 612 Comm: systemd-udevd Tainted: G
W -------- ---
5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64 #1
[ 6.845528] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[ 6.845533] RIP: 0010:kernfs_find_and_get_ns+0x11/0x70
[ 6.845539] Code: 78 e8 c3 fa 31 00 48 85 c0 75 e1 eb 93 66 66 2e
0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55 49 89 d5 41 54 49 89
f4 55 53 <48> 8b 47 38 48 89 fb 48 85 c0 48 0f 44 c7 48 8b a8 80 00 00
00 48
[ 6.845546] RSP: 0018:ffffa98c022f3aa0 EFLAGS: 00010246
[ 6.845550] RAX: 0000000000000000 RBX: ffffffffaf52c3c0 RCX: ffff9e150147b640
[ 6.845553] RDX: 0000000000000000 RSI: ffffffffaf52c508 RDI: 0000000000000000
[ 6.845557] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000249249d4
[ 6.845560] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffaf52c508
[ 6.845563] R13: 0000000000000000 R14: ffff9e157aa93900 R15: 0000000000000000
[ 6.845567] FS: 00007fabaafbf680(0000) GS:ffff9e23e6a00000(0000)
knlGS:0000000000000000
[ 6.845571] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6.845574] CR2: 0000000000000038 CR3: 000000017cb56000 CR4: 0000000000350ee0
[ 6.845578] Call Trace:
[ 6.845579] <TASK>
[ 6.845582] sysfs_unmerge_group+0x18/0x60
[ 6.845585] dpm_sysfs_remove+0x20/0x60
[ 6.845590] device_del+0xa4/0x3f0
[ 6.845594] platform_device_del.part.0+0x13/0x70
[ 6.845599] platform_device_unregister+0x1c/0x30
[ 6.845602] sysfb_disable+0x2d/0x60
[ 6.845605] remove_conflicting_framebuffers+0x1b/0xc0
[ 6.845610] remove_conflicting_pci_framebuffers+0xce/0x120
[ 6.845614] drm_aperture_remove_conflicting_pci_framebuffers+0x57/0x80
[ 6.845620] amdgpu_pci_probe+0xcb/0x360 [amdgpu]
[ 6.845760] local_pci_probe+0x41/0x80
[ 6.845764] pci_device_probe+0xaa/0x210
[ 6.845768] really_probe+0x1bf/0x390
[ 6.845771] __driver_probe_device+0xfc/0x170
[ 6.845775] driver_probe_device+0x1f/0x90
[ 6.845778] __driver_attach+0xbf/0x1b0
[ 6.845782] ? __device_attach_driver+0xe0/0xe0
[ 6.845785] bus_for_each_dev+0x65/0x90
[ 6.845789] bus_add_driver+0x15c/0x200
[ 6.845792] driver_register+0x89/0xe0
[ 6.845796] ? 0xffffffffc0c8d000
[ 6.845801] do_one_initcall+0x69/0x350
[ 6.845806] ? rcu_read_lock_sched_held+0x3c/0x70
[ 6.845810] ? trace_kmalloc+0x3c/0x100
[ 6.845814] ? kmem_cache_alloc_trace+0x1e8/0x350
[ 6.845818] do_init_module+0x4a/0x200
[ 6.845822] __do_sys_init_module+0x13a/0x190
[ 6.845827] do_syscall_64+0x5b/0x80
[ 6.845832] ? asm_exc_page_fault+0x27/0x30
[ 6.845835] ? lockdep_hardirqs_on+0x7d/0x100
[ 6.845839] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 6.845842] RIP: 0033:0x7fababb7463e
[ 6.845845] Code: 48 8b 0d e5 57 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b2 57 0c 00 f7 d8 64 89
01 48
[ 6.845852] RSP: 002b:00007ffc6a6c9658 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[ 6.845857] RAX: ffffffffffffffda RBX: 00005620deef53f0 RCX: 00007fababb7463e
[ 6.845860] RDX: 00005620deeb2df0 RSI: 00000000010bfac6 RDI: 00007faba943e010
[ 6.845864] RBP: 00005620deeb2df0 R08: 00005620deef4880 R09: 0000000000000000
[ 6.845867] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
[ 6.845870] R13: 00005620deeb5330 R14: 0000000000000000 R15: 00005620deef0410
[ 6.845875] </TASK>
[ 6.845877] Modules linked in: amdgpu(+) drm_ttm_helper ttm
iommu_v2 crct10dif_pclmul gpu_sched crc32_pclmul crc32c_intel
drm_buddy drm_display_helper ucsi_ccg nvme igb typec_ucsi
ghash_clmulni_intel ccp cec typec sp5100_tco nvme_core dca wmi
ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse
[ 6.845898] CR2: 0000000000000038
[ 6.845900] ---[ end trace 0000000000000000 ]---


$ /usr/src/kernels/5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64/scripts/faddr2line
/lib/debug/lib/modules/5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.debug
amdgpu_pci_probe+0xcb
amdgpu_pci_probe+0xcb/0x360:
amdgpu_pci_probe at
/usr/src/debug/kernel-5.19-rc5-49-gc1084b6c5620/linux-5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2061


$ cat -s -n /usr/src/debug/kernel-5.19-rc5-49-gc1084b6c5620/linux-5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
| head -2071 | tail -20
2052 "Use radeon.cik_support=0 amdgpu.cik_support=1 to override.\n"
2053 );
2054 return -ENODEV;
2055 }
2056 }
2057 #endif
2058
2059 /* Get rid of things like offb */
2060 ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev,
&amdgpu_kms_driver);
2061 if (ret)
2062 return ret;
2063
2064 adev = devm_drm_dev_alloc(&pdev->dev, &amdgpu_kms_driver,
typeof(*adev), ddev);
2065 if (IS_ERR(adev))
2066 return PTR_ERR(adev);
2067
2068 adev->dev = &pdev->dev;
2069 adev->pdev = pdev;
2070 ddev = adev_to_drm(adev);

$ git blame -L 2052,2070 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
Blaming lines: 100% (19/19), done.
984d7a929ad68 (Hans de Goede 2019-10-10 18:28:17 +0200 2052)
dev_info(&pdev->dev,
984d7a929ad68 (Hans de Goede 2019-10-10 18:28:17 +0200 2053)
"Use radeon.cik_support=0
amdgpu.cik_support=1 to override.\n"
984d7a929ad68 (Hans de Goede 2019-10-10 18:28:17 +0200 2054)
);
984d7a929ad68 (Hans de Goede 2019-10-10 18:28:17 +0200 2055)
return -ENODEV;
984d7a929ad68 (Hans de Goede 2019-10-10 18:28:17 +0200 2056)
}
984d7a929ad68 (Hans de Goede 2019-10-10 18:28:17 +0200 2057) }
984d7a929ad68 (Hans de Goede 2019-10-10 18:28:17 +0200 2058) #endif
984d7a929ad68 (Hans de Goede 2019-10-10 18:28:17 +0200 2059)
d38ceaf99ed01 (Alex Deucher 2015-04-20 16:55:21 -0400 2060)
/* Get rid of things like offb */
97c9bfe3f6605 (Thomas Zimmermann 2021-06-29 15:58:33 +0200 2061)
ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev,
&amdgpu_kms_driver);
d38ceaf99ed01 (Alex Deucher 2015-04-20 16:55:21 -0400 2062) if (ret)
d38ceaf99ed01 (Alex Deucher 2015-04-20 16:55:21 -0400 2063)
return ret;
d38ceaf99ed01 (Alex Deucher 2015-04-20 16:55:21 -0400 2064)
5088d6572e8ff (Luben Tuikov 2020-11-04 11:04:25 +0100 2065)
adev = devm_drm_dev_alloc(&pdev->dev, &amdgpu_kms_driver,
typeof(*adev), ddev);
df2ce4596c044 (Luben Tuikov 2020-09-18 15:25:04 +0200 2066)
if (IS_ERR(adev))
df2ce4596c044 (Luben Tuikov 2020-09-18 15:25:04 +0200 2067)
return PTR_ERR(adev);
8aba21b75136c (Luben Tuikov 2020-08-14 20:41:55 -0400 2068)
8aba21b75136c (Luben Tuikov 2020-08-14 20:41:55 -0400 2069)
adev->dev = &pdev->dev;
8aba21b75136c (Luben Tuikov 2020-08-14 20:41:55 -0400 2070)
adev->pdev = pdev;

Thomas, you recently changed this line. Can you tell why we are
catching kernel Oops here?

Full kernel log (5.19-rc5): https://pastebin.com/5Ag804bd


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

Attachment: OpenPGP_signature
Description: OpenPGP digital signature