Re: BUG: crash after suspending

From: Heikki Krogerus
Date: Wed Jan 25 2023 - 03:33:30 EST


Hi,

On Wed, Jan 25, 2023 at 01:23:01PM +0800, Fan Chengwei wrote:
> My laptop crash after suspending due to failure of driver of usb type c.
>
> This may be a follow-up to
> https://bugzilla.kernel.org/show_bug.cgi?id=216706 and
> https://bugzilla.kernel.org/show_bug.cgi?id=216697. You mentioned a patch in
> BUG 216697 above, which was merged into mainline in 6.2.0-rc5. But I tried
> 6.2.0-rc5 yesterday and it still doesn't work.
>
> I add a comment here in
> https://bugzilla.kernel.org/show_bug.cgi?id=216697#c11 and there is some
> discussion in https://bbs.archlinux.org/viewtopic.php?id=282999.
>
> Bisection shows that
> > # first bad commit: [4e3a50293c2b21961f02e1afa2f17d3a1a90c7c8] usb:
> typec: ucsi: acpi: Implement resume callback
> which makes my laptop freezes when resuming from suspend. While in
> https://bugzilla.kernel.org/show_bug.cgi?id=216706, someone reports that the
> same commit causes that USBC resume callback takes far too long.
>
> That bad commit was merged in mainline in 6.1.0-rc2, which causes my laptop
> to freeze and journalctl can't capture any message. While after the
> archlinux kernel package is upgraded to 6.1.1 and later, the system will not
> be stuck, but a oops will be displayed, I posted it here
> https://bbs.archlinux.org/viewtopic.php?id=282999. And following is the
> dmesg in 6.2.0-rc5:
> > [ 29.677975] Oops: 0000 [#1] PREEMPT SMP PTI
> > [ 29.677981] CPU: 4 PID: 73 Comm: kworker/4:1 Not tainted
> 6.2.0-rc5-1-mainline #1 9dd3e34c332001c1d20c681d031ef729664f899d
> > [ 29.677989] Hardware name: LENOVO 81HX/LNVNB161216, BIOS
> 6UCN53WW(V4.08) 09/26/2018
> > [ 29.677992] Workqueue: events_long ucsi_resume_work [typec_ucsi]
> > [ 29.678017] RIP: 0010:ucsi_resume_work+0x32/0x80 [typec_ucsi]
> > [ 29.678037] Code: 00 55 31 c9 31 d2 53 48 8b b7 a0 00 00 00 48 89 fb 48
> 83 ef 38 48 83 ce 05 e8 5a f6 ff ff 85 c0 0f 88 95 22 00 00 48 8b 5b f8 <48>
> 83 bb 88 00 00 00 00 74 3b 48 8d 6b 10 48 89 ef e8 f8 57 a6 e2
> > [ 29.678041] RSP: 0000:ffffb2dac030fe80 EFLAGS: 00010246
> > [ 29.678047] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000002
> > [ 29.678050] RDX: 0000000000000000 RSI: 0000000000000246 RDI:
> ffff91b009189db8
> > [ 29.678053] RBP: ffff91b169f32b00 R08: 0000000000000001 R09:
> 0000000000000000
> > [ 29.678056] R10: 0000000000000004 R11: 0000000000000000 R12:
> ffff91b169f38b00
> > [ 29.678059] R13: 0000000000000000 R14: ffff91b000f5dc00 R15:
> ffff91b009189d40
> > [ 29.678063] FS: 0000000000000000(0000) GS:ffff91b169f00000(0000)
> knlGS:0000000000000000
> > [ 29.678067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 29.678071] CR2: 0000000000000088 CR3: 000000005e810001 CR4:
> 00000000003706e0
> > [ 29.678075] Call Trace:
> > [ 29.678080] <TASK>
> > [ 29.678085] process_one_work+0x1c5/0x380
> > [ 29.678099] worker_thread+0x51/0x390
> > [ 29.678109] ? __pfx_worker_thread+0x10/0x10
> > [ 29.678117] kthread+0xdb/0x110
> > [ 29.678124] ? __pfx_kthread+0x10/0x10
> > [ 29.678130] ret_from_fork+0x29/0x50
> > [ 29.678146] </TASK>
> > [ 29.678148] Modules linked in: nft_chain_nat xt_REDIRECT nf_nat
> nf_conntrack xt_mark nft_compat nf_tables libcrc32c nfnetlink snd_soc_avs
> xt_TPROXY snd_soc_hda_codec nf_tproxy_ipv6 snd_soc_skl nf_tproxy_ipv4
> nf_defrag_ipv6 snd_soc_hdac_hda nf_defrag_ipv4 snd_hda_ext_core
> snd_soc_sst_ipc intel_tcc_cooling snd_soc_sst_dsp x86_pkg_temp_thermal
> snd_soc_acpi_intel_match intel_powerclamp kvm_intel snd_soc_acpi
> snd_soc_core ccm snd_hda_codec_hdmi algif_aead snd_compress
> snd_hda_codec_conexant kvm snd_hda_codec_generic ac97_bus cbc ledtrig_audio
> irqbypass ath10k_pci snd_pcm_dmaengine crct10dif_pclmul crc32_pclmul
> hid_logitech_hidpp polyval_clmulni des_generic snd_hda_intel polyval_generic
> libdes gf128mul snd_intel_dspcfg ath10k_core ecb ghash_clmulni_intel
> snd_intel_sdw_acpi sha512_ssse3 iTCO_wdt snd_hda_codec ath intel_pmc_bxt
> aesni_intel algif_skcipher uvcvideo cmac joydev snd_hda_core serio_raw
> crypto_simd snd_hwdep iTCO_vendor_support mei_hdcp mousedev mei_pxp
> intel_rapl_msr 8021q atkbd
> > [ 29.678263] cryptd hid_logitech_dj garp libps2 md4 btusb
> videobuf2_vmalloc snd_pcm mrp mac80211 rapl vivaldi_fmap algif_hash
> videobuf2_memops processor_thermal_device_pci_legacy btrtl snd_timer r8169
> btbcm stp intel_cstate af_alg llc videobuf2_v4l2 snd coretemp
> processor_thermal_device i2c_i801 libarc4 intel_uncore btintel realtek btmtk
> intel_wmi_thunderbolt wmi_bmof mdio_devres i2c_smbus soundcore ucsi_acpi
> mei_me processor_thermal_rfim cfg80211 bluetooth videodev libphy typec_ucsi
> processor_thermal_mbox intel_lpss_pci vfat i2c_hid_acpi mei
> processor_thermal_rapl intel_lpss fat idma64 typec ecdh_generic
> videobuf2_common i2c_hid intel_xhci_usb_role_switch intel_rapl_common mc
> usbhid intel_soc_dts_iosf intel_pch_thermal roles elan_i2c ideapad_laptop
> sparse_keymap platform_profile int3403_thermal rfkill int340x_thermal_zone
> i8042 serio int3400_thermal soc_button_array acpi_thermal_rel acpi_pad
> mac_hid vmw_vmci pkcs8_key_parser dm_multipath crypto_user fuse bpf_preload
> ip_tables x_tables
> > [ 29.678388] ext4 crc32c_generic crc16 mbcache jbd2 dm_mod nvme
> nvme_core crc32c_intel xhci_pci nvme_common xhci_pci_renesas i915 drm_buddy
> intel_gtt video wmi drm_display_helper cec ttm
> > [ 29.678420] CR2: 0000000000000088
> > [ 29.678424] ---[ end trace 0000000000000000 ]---
> > [ 29.678426] RIP: 0010:ucsi_resume_work+0x32/0x80 [typec_ucsi]
> > [ 29.678445] Code: 00 55 31 c9 31 d2 53 48 8b b7 a0 00 00 00 48 89 fb 48
> 83 ef 38 48 83 ce 05 e8 5a f6 ff ff 85 c0 0f 88 95 22 00 00 48 8b 5b f8 <48>
> 83 bb 88 00 00 00 00 74 3b 48 8d 6b 10 48 89 ef e8 f8 57 a6 e2
> > [ 29.678449] RSP: 0000:ffffb2dac030fe80 EFLAGS: 00010246
> > [ 29.678454] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000002
> > [ 29.678457] RDX: 0000000000000000 RSI: 0000000000000246 RDI:
> ffff91b009189db8
> > [ 29.678460] RBP: ffff91b169f32b00 R08: 0000000000000001 R09:
> 0000000000000000
> > [ 29.678462] R10: 0000000000000004 R11: 0000000000000000 R12:
> ffff91b169f38b00
> > [ 29.678465] R13: 0000000000000000 R14: ffff91b000f5dc00 R15:
> ffff91b009189d40
> > [ 29.678468] FS: 0000000000000000(0000) GS:ffff91b169f00000(0000)
> knlGS:0000000000000000
> > [ 29.678472] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 29.678475] CR2: 0000000000000088 CR3: 000000005e810001 CR4:
> 00000000003706e0
> Although you merged a patch in 6.2.0-rc5, nothing seems to have changed.
>
> I also noticed one more thing, since I started using linux system every time
> I boot there is a message:
> > ucsi_acpi USBC000:00: PPM init failed (-16)
> Its return value sometimes changes, about 70% is -16, 20% is -19, 10% is
> -110, and sporadically -22, -95. On the past system, this will not cause any
> problems. I thought it was completely harmless. But one time when I was
> testing the kernel this error did not appear and the system resumed normally
> from suspending, even though it was a bad kernel. This happens once in about
> 150 boots, I can't reproduce it. Besides, even on a bad kernel, as long as
> after entering the system, `rmmod ucsi_acpi typec_ucsi` and then modprobe
> them. The system can resume normally.
>
> I'm not an experienced linux user and I don't know much about it, I hope to
> get your help.

The information you just gave is very useful. Thank you!

I'm still not completely sure if I understand the problem, but I'm
attaching a patch. Can you test it?

I'll also attach it to the bug.

thanks,

--
heikki