BUG: crash after suspending

From: Fan Chengwei
Date: Wed Jan 25 2023 - 00:23:16 EST


Dear Linux Developer,

My laptop crash after suspending due to failure of driver of usb type c.

This may be a follow-up to https://bugzilla.kernel.org/show_bug.cgi?id=216706 and https://bugzilla.kernel.org/show_bug.cgi?id=216697. You mentioned a patch in BUG 216697 above, which was merged into mainline in 6.2.0-rc5. But I tried 6.2.0-rc5 yesterday and it still doesn't work.

I add a comment here in https://bugzilla.kernel.org/show_bug.cgi?id=216697#c11 and there is some discussion in https://bbs.archlinux.org/viewtopic.php?id=282999.

Bisection shows that
# first bad commit: [4e3a50293c2b21961f02e1afa2f17d3a1a90c7c8] usb:
typec: ucsi: acpi: Implement resume callback
which makes my laptop freezes when resuming from suspend. While in https://bugzilla.kernel.org/show_bug.cgi?id=216706, someone reports that the same commit causes that USBC resume callback takes far too long.

That bad commit was merged in mainline in 6.1.0-rc2, which causes my laptop to freeze and journalctl can't capture any message. While after the archlinux kernel package is upgraded to 6.1.1 and later, the system will not be stuck, but a oops will be displayed, I posted it here https://bbs.archlinux.org/viewtopic.php?id=282999. And following is the dmesg in 6.2.0-rc5:
> [ 29.677975] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 29.677981] CPU: 4 PID: 73 Comm: kworker/4:1 Not tainted 6.2.0-rc5-1-mainline #1 9dd3e34c332001c1d20c681d031ef729664f899d
> [ 29.677989] Hardware name: LENOVO 81HX/LNVNB161216, BIOS 6UCN53WW(V4.08) 09/26/2018
> [ 29.677992] Workqueue: events_long ucsi_resume_work [typec_ucsi]
> [ 29.678017] RIP: 0010:ucsi_resume_work+0x32/0x80 [typec_ucsi]
> [ 29.678037] Code: 00 55 31 c9 31 d2 53 48 8b b7 a0 00 00 00 48 89 fb 48 83 ef 38 48 83 ce 05 e8 5a f6 ff ff 85 c0 0f 88 95 22 00 00 48 8b 5b f8 <48> 83 bb 88 00 00 00 00 74 3b 48 8d 6b 10 48 89 ef e8 f8 57 a6 e2
> [ 29.678041] RSP: 0000:ffffb2dac030fe80 EFLAGS: 00010246
> [ 29.678047] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> [ 29.678050] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff91b009189db8
> [ 29.678053] RBP: ffff91b169f32b00 R08: 0000000000000001 R09: 0000000000000000
> [ 29.678056] R10: 0000000000000004 R11: 0000000000000000 R12: ffff91b169f38b00
> [ 29.678059] R13: 0000000000000000 R14: ffff91b000f5dc00 R15: ffff91b009189d40
> [ 29.678063] FS: 0000000000000000(0000) GS:ffff91b169f00000(0000) knlGS:0000000000000000
> [ 29.678067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 29.678071] CR2: 0000000000000088 CR3: 000000005e810001 CR4: 00000000003706e0
> [ 29.678075] Call Trace:
> [ 29.678080] <TASK>
> [ 29.678085] process_one_work+0x1c5/0x380
> [ 29.678099] worker_thread+0x51/0x390
> [ 29.678109] ? __pfx_worker_thread+0x10/0x10
> [ 29.678117] kthread+0xdb/0x110
> [ 29.678124] ? __pfx_kthread+0x10/0x10
> [ 29.678130] ret_from_fork+0x29/0x50
> [ 29.678146] </TASK>
> [ 29.678148] Modules linked in: nft_chain_nat xt_REDIRECT nf_nat nf_conntrack xt_mark nft_compat nf_tables libcrc32c nfnetlink snd_soc_avs xt_TPROXY snd_soc_hda_codec nf_tproxy_ipv6 snd_soc_skl nf_tproxy_ipv4 nf_defrag_ipv6 snd_soc_hdac_hda nf_defrag_ipv4 snd_hda_ext_core snd_soc_sst_ipc intel_tcc_cooling snd_soc_sst_dsp x86_pkg_temp_thermal snd_soc_acpi_intel_match intel_powerclamp kvm_intel snd_soc_acpi snd_soc_core ccm snd_hda_codec_hdmi algif_aead snd_compress snd_hda_codec_conexant kvm snd_hda_codec_generic ac97_bus cbc ledtrig_audio irqbypass ath10k_pci snd_pcm_dmaengine crct10dif_pclmul crc32_pclmul hid_logitech_hidpp polyval_clmulni des_generic snd_hda_intel polyval_generic libdes gf128mul snd_intel_dspcfg ath10k_core ecb ghash_clmulni_intel snd_intel_sdw_acpi sha512_ssse3 iTCO_wdt snd_hda_codec ath intel_pmc_bxt aesni_intel algif_skcipher uvcvideo cmac joydev snd_hda_core serio_raw crypto_simd snd_hwdep iTCO_vendor_support mei_hdcp mousedev mei_pxp intel_rapl_msr 8021q atkbd
> [ 29.678263] cryptd hid_logitech_dj garp libps2 md4 btusb videobuf2_vmalloc snd_pcm mrp mac80211 rapl vivaldi_fmap algif_hash videobuf2_memops processor_thermal_device_pci_legacy btrtl snd_timer r8169 btbcm stp intel_cstate af_alg llc videobuf2_v4l2 snd coretemp processor_thermal_device i2c_i801 libarc4 intel_uncore btintel realtek btmtk intel_wmi_thunderbolt wmi_bmof mdio_devres i2c_smbus soundcore ucsi_acpi mei_me processor_thermal_rfim cfg80211 bluetooth videodev libphy typec_ucsi processor_thermal_mbox intel_lpss_pci vfat i2c_hid_acpi mei processor_thermal_rapl intel_lpss fat idma64 typec ecdh_generic videobuf2_common i2c_hid intel_xhci_usb_role_switch intel_rapl_common mc usbhid intel_soc_dts_iosf intel_pch_thermal roles elan_i2c ideapad_laptop sparse_keymap platform_profile int3403_thermal rfkill int340x_thermal_zone i8042 serio int3400_thermal soc_button_array acpi_thermal_rel acpi_pad mac_hid vmw_vmci pkcs8_key_parser dm_multipath crypto_user fuse bpf_preload ip_tables x_tables
> [ 29.678388] ext4 crc32c_generic crc16 mbcache jbd2 dm_mod nvme nvme_core crc32c_intel xhci_pci nvme_common xhci_pci_renesas i915 drm_buddy intel_gtt video wmi drm_display_helper cec ttm
> [ 29.678420] CR2: 0000000000000088
> [ 29.678424] ---[ end trace 0000000000000000 ]---
> [ 29.678426] RIP: 0010:ucsi_resume_work+0x32/0x80 [typec_ucsi]
> [ 29.678445] Code: 00 55 31 c9 31 d2 53 48 8b b7 a0 00 00 00 48 89 fb 48 83 ef 38 48 83 ce 05 e8 5a f6 ff ff 85 c0 0f 88 95 22 00 00 48 8b 5b f8 <48> 83 bb 88 00 00 00 00 74 3b 48 8d 6b 10 48 89 ef e8 f8 57 a6 e2
> [ 29.678449] RSP: 0000:ffffb2dac030fe80 EFLAGS: 00010246
> [ 29.678454] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
> [ 29.678457] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff91b009189db8
> [ 29.678460] RBP: ffff91b169f32b00 R08: 0000000000000001 R09: 0000000000000000
> [ 29.678462] R10: 0000000000000004 R11: 0000000000000000 R12: ffff91b169f38b00
> [ 29.678465] R13: 0000000000000000 R14: ffff91b000f5dc00 R15: ffff91b009189d40
> [ 29.678468] FS: 0000000000000000(0000) GS:ffff91b169f00000(0000) knlGS:0000000000000000
> [ 29.678472] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 29.678475] CR2: 0000000000000088 CR3: 000000005e810001 CR4: 00000000003706e0
Although you merged a patch in 6.2.0-rc5, nothing seems to have changed.

I also noticed one more thing, since I started using linux system every time I boot there is a message:
> ucsi_acpi USBC000:00: PPM init failed (-16)
Its return value sometimes changes, about 70% is -16, 20% is -19, 10% is -110, and sporadically -22, -95. On the past system, this will not cause any problems. I thought it was completely harmless. But one time when I was testing the kernel this error did not appear and the system resumed normally from suspending, even though it was a bad kernel. This happens once in about 150 boots, I can't reproduce it. Besides, even on a bad kernel, as long as after entering the system, `rmmod ucsi_acpi typec_ucsi` and then modprobe them. The system can resume normally.

I'm not an experienced linux user and I don't know much about it, I hope to get your help.

Best,
Chengwei