On 4/23/2025 2:12 PM, Mario Limonciello wrote:
On 4/23/2025 10:18 AM, Mario Limonciello wrote:
On 4/23/2025 10:06 AM, Mark Brown wrote:
On Wed, Apr 16, 2025 at 01:20:33PM +0200, Jacek Luczak wrote:
Hi,
On my ASUS Vivobook S16 (and on similar ASUS HW - see [1]) on resume
from suspend system dies (no logs available) soon after GPU completes
resume - I can see the login screen, only power cycle left.
Are there any updates on this from the AMD side? As things stand my
inclination is to revert the bulk of the changes to the driver from the
past merge window, I don't really know anything about this hardware
specifically and "dies without logs" is obviously giving few hints.
None of the skipped commits looks immediately suspect, there's doubtless
some unintended change in there.
This is the first I'm hearing of it; I expect we can dig in and find a solution so we don't need to revert that whole series.
Let me add Vijendar to check if this jumps out to him what went wrong.
* Can we please see the full dmesg up to the failure?
* journalctl -k -b-1 can fetch everything from the last boot up until the freeze.
* Any crash in /var/lib/systemd/pstore by chance?
Adding Mario and leaving the context for his benefit.
To double check - can you blacklist the ACP driver and suspend/resume and everything is OK?
If possible can you please capture a report with https:// web.git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug- tools.git/ tree/amd_s2idle.py both in the case of ACP driver blacklisted and not blacklisted? I would like to compare.
Also; can you put all these artifacts I'm asking for into somewhere non ephemeral like a kernel bugzilla? You can loop me and Vijendar into it.
FYI - We managed to track an S16 down and can reproduce the issue.
It's a NULL pointer deref happening on the resume path.
<1>[ 74.046372] BUG: kernel NULL pointer dereference, address: 0000000000000010
<1>[ 74.046375] #PF: supervisor read access in kernel mode
<1>[ 74.046377] #PF: error_code(0x0000) - not-present page
<6>[ 74.046380] PGD 0 P4D 0
<4>[ 74.046384] Oops: Oops: 0000 [#1] SMP NOPTI
<4>[ 74.046389] CPU: 4 UID: 0 PID: 2563 Comm: rtcwake Not tainted 6.15.0-061500rc3-generic #202504202138 PREEMPT(voluntary)
Oops#1 Part4
<4>[ 74.046394] Hardware name: ASUSTeK COMPUTER INC. ASUS Vivobook S 16 M5606KA_M5606KA/M5606KA, BIOS M5606KA.304 01/24/2025
<4>[ 74.046396] RIP: 0010:acp70_pcm_resume+0x4f/0xe0 [snd_acp70]
<4>[ 74.046405] Code: 48 89 45 d0 e8 c2 da 98 fc 49 8b 5d 50 49 39 de 75 18 eb 7b 48 89 da 4c 89 ee 4c 89 ff e8 29 cc f6 ff 48 8b 1b 4c 39 f3 74 65 <4c> 8b 7b 10 4d 85 ff 74 ef 49 8b 97 c0 00 00 00 48 85 d2 74 e3 8b
<4>[ 74.046407] RSP: 0018:ffffd12644d13880 EFLAGS: 00010286
<4>[ 74.046410] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
<4>[ 74.046412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
<4>[ 74.046413] RBP: ffffd12644d138b0 R08: 0000000000000000 R09: 0000000000000000
<4>[ 74.046415] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffbd774fd0
<4>[ 74.046416] R13: ffff8a9f13051e00 R14: ffff8a9f13051e50 R15: 0000000000000010
<4>[ 74.046418] FS: 0000799af9db9740(0000) GS:ffff8aa486e9d000(0000) knlGS:0000000000000000
<4>[ 74.046420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 74.046421] CR2: 0000000000000010 CR3: 000000016dfaa000 CR4: 0000000000f50ef0
<4>[ 74.046423] PKRU: 55555554
<4>[ 74.046425] Call Trace:
<4>[ 74.046427] <TASK>
<4>[ 74.046432] ? __pfx_platform_pm_resume+0x10/0x10
<4>[ 74.046439] platform_pm_resume+0x28/0x60
<4>[ 74.046443] dpm_run_callback+0x63/0x160
<4>[ 74.046447] device_resume+0x15c/0x260
<4>[ 74.046450] dpm_resume+0x15d/0x230
<4>[ 74.046453] dpm_resume_end+0x11/0x30
<4>[ 74.046456] suspend_devices_and_enter+0x1ea/0x2c0
<4>[ 74.046460] enter_state+0x223/0x560
Oops#1 Part3
<4>[ 74.046463] pm_suspend+0x4e/0x80
We'll need some more time to dig into it, but I wanted to share the trace in case it makes it jump out to anyone what's going on.
Just looking at git blame from that function is this perhaps 8fd0e127d8da856e34391399df40b33af2b307e0?
Reverting a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90 seems to help the issue on S16 here.
Jacek - can you reproduce with that reverted?
I've managed to bisect this as close as possible to following commits:
- [f8b4f3f525e82d78079a6ebbde68e4a0d79fd1c0] ASoC: amd: acp: Refactor
acp70 platform resource structure
- [c8b5f251f0e53edab220ac4edf444120815fed3c] ASoC: amd: acp: Remove white line
- [a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90] ASoC: amd: acp: Move
spin_lock and list initialization to acp-pci driver
- [e3933683b25e2cc94485da4909e3338e1a177b39] ASoC: amd: acp: Remove
redundant acp_dev_data structure
- [aaf7a668bb3814f084f9f6f673567f6aa316632f] ASoC: amd: acp: Add new
interrupt handle callbacks in acp_common_hw_ops
Attached lspci and bisection log.
Regards,
-jacek
[1] https://bbs.archlinux.org/viewtopic.php?id=304816
git bisect start
# status: waiting for both good and bad commits
# good: [ed92bc5264c4357d4fca292c769ea9967cd3d3b6] ASoC: codecs: wm0010: Fix error handling path in wm0010_spi_probe()
git bisect good ed92bc5264c4357d4fca292c769ea9967cd3d3b6
# status: waiting for bad commit, 1 good commit known
# bad: [47c4f9b1722fd883c9745d7877cb212e41dd2715] Tidy up ASoC control get and put handlers
git bisect bad 47c4f9b1722fd883c9745d7877cb212e41dd2715
# good: [74da545ec6a8b41de96b4c350bb59dfe45c0d822] ASoC: codec: madera: use inclusive language for SND_SOC_DAIFMT_CBx_CFx
git bisect good 74da545ec6a8b41de96b4c350bb59dfe45c0d822
# bad: [a935b3f981809272d2649ad9c27a751685137846] ASoC: SOF: ipc4- topology: Allocate ref_params on stack
git bisect bad a935b3f981809272d2649ad9c27a751685137846
# good: [24056de9976dfc33801d2574c1672d91f840277a] ASoC: codecs: Update device_id tables for Realtek
git bisect good 24056de9976dfc33801d2574c1672d91f840277a
# good: [a1462fb8b5dd1018e3477a6861822d75c6a59449] ASoC: Intel: boards: updates for 6.15
git bisect good a1462fb8b5dd1018e3477a6861822d75c6a59449
# skip: [8a7e7a03e3c53cd9abbbf233899cc2e05b2c6ec0] ASoC: SOF: Intel: Add support for ACE3+ mic privacy
git bisect skip 8a7e7a03e3c53cd9abbbf233899cc2e05b2c6ec0
# skip: [aaf7a668bb3814f084f9f6f673567f6aa316632f] ASoC: amd: acp: Add new interrupt handle callbacks in acp_common_hw_ops
git bisect skip aaf7a668bb3814f084f9f6f673567f6aa316632f
# good: [c6141ba0110f98266106699aca071fed025c3d64] ASoC: Merge up fixes
git bisect good c6141ba0110f98266106699aca071fed025c3d64
# skip: [ad5a0970f86d82e39ebd06d45a1f7aa48a1316f8] ASoC: cs35l41: check the return value from spi_setup()
git bisect skip ad5a0970f86d82e39ebd06d45a1f7aa48a1316f8
# good: [269b844239149a9bbaba66518db99ebb06554a15] ASoC: dapm: Fix changes to DECLARE_ADAU17X1_DSP_MUX_CTRL
git bisect good 269b844239149a9bbaba66518db99ebb06554a15
# skip: [89be3c15a58b2ccf31e969223c8ac93ca8932d81] ASoC: qcom: sm8250: explicitly set format in sm8250_be_hw_params_fixup()
git bisect skip 89be3c15a58b2ccf31e969223c8ac93ca8932d81
# bad: [02e1cf7a352a3ba5f768849f2b4fcaaaa19f89e3] ASoC: amd: acp: Fix for enabling DMIC on acp platforms via _DSD entry
git bisect bad 02e1cf7a352a3ba5f768849f2b4fcaaaa19f89e3
# good: [7a2ff0510c51462c0a979f5006d375a2b23d46e9] ASoC: soc-pcm: reuse dpcm_state_string()
git bisect good 7a2ff0510c51462c0a979f5006d375a2b23d46e9
# good: [a8fed0bddf8fa239fc71dc5c035d2e078c597369] ASoC: dt- bindings: add regulator support to dmic codec
git bisect good a8fed0bddf8fa239fc71dc5c035d2e078c597369
# bad: [ee7ab0fd540877fceb3d51f87016e6531d86406f] ASoC: amd: acp: Refactor rembrant platform resource structure
git bisect bad ee7ab0fd540877fceb3d51f87016e6531d86406f
# good: [0d2d276f53ea3ba1686619cde503d9748f58a834] ASoC: SOF: Intel: lnl/ptl: Only set dsp_ops which differs from MTL
git bisect good 0d2d276f53ea3ba1686619cde503d9748f58a834
# good: [8aeb7d2c3fc315e629d252cd601598a5af74bbb0] ASoC: SOF: Intel: Create ptl.c as placeholder for Panther Lake features
git bisect good 8aeb7d2c3fc315e629d252cd601598a5af74bbb0
# skip: [ac5b4a24f16f2f56b5cc5092969930b867274edc] ASoC: Intel: soc- acpi-intel-ptl-match: Add cs42l43 support
git bisect skip ac5b4a24f16f2f56b5cc5092969930b867274edc
# skip: [8ae746fe51041484e52eba99bed15a444c7d4372] ASoC: amd: acp: Implement acp_common_hw_ops support for acp platforms
git bisect skip 8ae746fe51041484e52eba99bed15a444c7d4372
# good: [0978e8207b61ac6d51280e5d28ccfff75d653363] ASoC: SOF: Intel: hda-mlink: Add support for mic privacy in VS SHIM registers
git bisect good 0978e8207b61ac6d51280e5d28ccfff75d653363
# good: [4a43c3241ec3465a501825ecaf051e5a1d85a60b] ASoC: SOF: Intel: ptl: Add support for mic privacy
git bisect good 4a43c3241ec3465a501825ecaf051e5a1d85a60b
# skip: [1ec3f1dc215d4b3d3679ecdc4a549d4e82b3a609] ASoC: dmic: add regulator support
git bisect skip 1ec3f1dc215d4b3d3679ecdc4a549d4e82b3a609
# good: [e2cda461765692757cd5c3b1fc80bd260ffe1394] ASoC: amd: acp: Refactor dmic-codec platform device creation
git bisect good e2cda461765692757cd5c3b1fc80bd260ffe1394
# skip: [a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90] ASoC: amd: acp: Move spin_lock and list initialization to acp-pci driver
git bisect skip a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90
# bad: [f8b4f3f525e82d78079a6ebbde68e4a0d79fd1c0] ASoC: amd: acp: Refactor acp70 platform resource structure
git bisect bad f8b4f3f525e82d78079a6ebbde68e4a0d79fd1c0
# good: [6e60db74b69c29b528c8d10d86108f78f2995dcb] ASoC: amd: acp: Refactor acp machine select
git bisect good 6e60db74b69c29b528c8d10d86108f78f2995dcb
# skip: [e3933683b25e2cc94485da4909e3338e1a177b39] ASoC: amd: acp: Remove redundant acp_dev_data structure
git bisect skip e3933683b25e2cc94485da4909e3338e1a177b39
# skip: [c8b5f251f0e53edab220ac4edf444120815fed3c] ASoC: amd: acp: Remove white line
git bisect skip c8b5f251f0e53edab220ac4edf444120815fed3c
# only skipped commits left to test
# possible first bad commit: [f8b4f3f525e82d78079a6ebbde68e4a0d79fd1c0] ASoC: amd: acp: Refactor acp70 platform resource structure
# possible first bad commit: [c8b5f251f0e53edab220ac4edf444120815fed3c] ASoC: amd: acp: Remove white line
# possible first bad commit: [a95a1dbbd3d64adf392fed13c8eef4f72b4e5b90] ASoC: amd: acp: Move spin_lock and list initialization to acp-pci driver
# possible first bad commit: [e3933683b25e2cc94485da4909e3338e1a177b39] ASoC: amd: acp: Remove redundant acp_dev_data structure
# possible first bad commit: [aaf7a668bb3814f084f9f6f673567f6aa316632f] ASoC: amd: acp: Add new interrupt handle callbacks in acp_common_hw_ops
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo Root Complex [1022:1507]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Strix/Strix Halo IOMMU [1022:1508]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo Dummy Host Bridge [1022:1509]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo PCIe USB4 Bridge [1022:150a]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo Dummy Host Bridge [1022:1509]
00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo GPP Bridge [1022:150b]
00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo GPP Bridge [1022:150b]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo Dummy Host Bridge [1022:1509]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo Dummy Host Bridge [1022:1509]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo Internal GPP Bridge to Bus [C:A] [1022:150c]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo Internal GPP Bridge to Bus [C:A] [1022:150c]
00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/ Strix Halo Internal GPP Bridge to Bus [C:A] [1022:150c]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix Data Fabric; Function 0 [1022:16f8]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix Data Fabric; Function 1 [1022:16f9]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix Data Fabric; Function 2 [1022:16fa]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix Data Fabric; Function 3 [1022:16fb]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix Data Fabric; Function 4 [1022:16fc]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix Data Fabric; Function 5 [1022:16fd]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix Data Fabric; Function 6 [1022:16fe]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Strix Data Fabric; Function 7 [1022:16ff]
61:00.0 Non-Volatile memory controller [0108]: Micron Technology Inc 2400 NVMe SSD (DRAM-less) [1344:5413] (rev 03)
62:00.0 Network controller [0280]: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter [14c3:0616]
63:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ ATI] Strix [Radeon 880M / 890M] [1002:150e] (rev c1)
63:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640]
63:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Strix/Krackan/Strix Halo CCP/ASP [1022:17e0]
63:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:151e]
63:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 70)
63:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller [1022:15e3]
64:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Strix/Strix Halo PCIe Dummy Function [1022:150d]
64:00.1 Signal processing controller [1180]: Advanced Micro Devices, Inc. [AMD] Strix/Krackan/Strix Halo Neural Processing Unit [1022:17f0] (rev 10)
65:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:151f]
65:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:151a]
65:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:151b]
65:00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:151c]