Re: Possible regression in 5.16-stable

From: Michele Ballabio
Date: Mon Apr 04 2022 - 18:08:24 EST


On Mon, 4 Apr 2022 09:12:41 +0200
Thorsten Leemhuis <regressions@xxxxxxxxxxxxx> wrote:

> > Kernels 5.16.10 do not have the following regression, 5.16.11-16
>
> 5.16.11-16 sounds like this is a distro kernel that might or might not
> be patched. Or is 11-16 just meant as a range. Could you clarify?

Sorry, I meant the problem occurred on 5.16.11, .12 and .16.

> > do. My machine would freeze completely about once a week, no oops in
> > the logs, sysrq won't work either. I managed to log only the
> > following (and only once) with netconsole, while running kernel
> > 5.16.16. I could not reproduce the problem since.
>
> Hmmm. Of course ideally all regressions get fixed, but that beeing
> said: 5.16 will likely be EOL in round about two weeks anway and
> getting to the root of this problem might take some time and effort.
> That's why I'm not sure myself what's the best way forward here.

I'm aware of this, but given the nature of the problem and how difficult
it is to reproduce, I thought it was better to report it.
Meanwhile I'm now on 5.17.1: let's say this is on hold until someone
has a similar problem with 5.17.x.

> Maybe testing 5.17 to see if the problem still shows up would be
> good; bisection would help, but I guess that will be hard here. But I
> guess there is one thing that could help: could you maybe decode the
> panic you have as described in this document:
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

Thanks, I tried but I'm not sure it's of any help:

----------
0,1493,12767657117,-;traps: PANIC: double fault, error_code: 0x0
4,1494,12767657121,-;double fault: 0000 [#1] PREEMPT SMP NOPTI
4,1496,12767657126,-;Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018
4,1497,12767657127,-;RIP: entry_SYSCALL_64+0x3/0x29
4,1498,12767657133,-;Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 01 f8 <65> 48 89 24 25 14 60 00 00 eb 12 0f 20 dc 0f 1f 44 00 00 48 81 e4
All code
========
0: cc int3
1: cc int3
2: cc int3
3: cc int3
4: cc int3
5: cc int3
6: cc int3
7: cc int3
8: cc int3
9: cc int3
a: cc int3
b: cc int3
c: cc int3
d: cc int3
e: cc int3
f: cc int3
10: cc int3
11: cc int3
12: cc int3
13: cc int3
14: cc int3
15: cc int3
16: cc int3
17: cc int3
18: cc int3
19: cc int3
1a: cc int3
1b: cc int3
1c: cc int3
1d: cc int3
1e: cc int3
1f: cc int3
20: cc int3
21: cc int3
22: cc int3
23: cc int3
24: cc int3
25: cc int3
26: cc int3
27: 0f 01 f8 swapgs
2a:* 65 48 89 24 25 14 60 mov %rsp,%gs:0x6014 <-- trapping instruction
31: 00 00
33: eb 12 jmp 0x47
35: 0f 20 dc mov %cr3,%rsp
38: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3d: 48 rex.W
3e: 81 .byte 0x81
3f: e4 .byte 0xe4

Code starting with the faulting instruction
===========================================
0: 65 48 89 24 25 14 60 mov %rsp,%gs:0x6014
7: 00 00
9: eb 12 jmp 0x1d
b: 0f 20 dc mov %cr3,%rsp
e: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
13: 48 rex.W
14: 81 .byte 0x81
15: e4 .byte 0xe4
4,1499,12767657134,-;RSP: 0018:00007f2a8bcbd438 EFLAGS: 00010002
4,1500,12767657136,-;RAX: 00000000000000ca RBX: 000000000000005d RCX: 00007f2aa45e8aab
4,1501,12767657138,-;RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f2aa4400018
4,1502,12767657139,-;RBP: 00007f2aa4400018 R08: 0000000000000000 R09: 00007f2a8ed00000
4,1503,12767657140,-;R10: 0000000000000000 R11: 0000000000000282 R12: 00000000000000a8
4,1504,12767657141,-;R13: 0000000000000003 R14: 0000000000000030 R15: 00007f2aa4400000
4,1505,12767657142,-;FS: 00007f2a8bcbe640(0000) GS:ffff8b110ed00000(0000) knlGS:0000000000000000
4,1506,12767657143,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1507,12767657144,-;CR2: 00007f2a8bcbd428 CR3: 00000002953f2000 CR4: 00000000003506e0
4,1508,12767657146,-;Call Trace:
4,1509,12767657146,-,ncfrag=0/986;Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic ecc netconsole uas usb_storage snd_seq_dummy snd_hrtimer snd_seq snd_seq_device iptable_filter xt_tcpudp ip_tables
x_tables hwmon_vid 8021q garp mrp stp llc ipv6 fuse rt73usb rt2x00usb rt2x00lib mac80211 hid_logitech cfg80211 joydev hid_generic usbhid hid amdgpu intel_rapl_msr iommu_v2 intel_rapl_common gpu_sched eeepc_wmi asus_wmi drm_ttm_helper
ttm platform_profile battery drm_kms_helper sparse_keymap edac_mce_amd rfkill drm kvm_amd snd_hda_codec_realtek video snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel agpgart snd_intel_dspcfg snd_intel_sdw_acpi
wmi_bmof snd_hda_codec evdev i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hwdep mfd_core snd_pcm r8169 irqbypass snd_timer realtek snd xhci_pci xhci_pci_renesas xhci_hcd mdio_devres crct10dif_pclmul
crc32_pclmul i2c_piix4 soundcore ccp libphy ghash_clmulni_intel i2c_co4,1509,12767657146,-,ncfrag=966/986;re rapl k10temp wmi
4,1510,12767657189,c; acpi_cpufreq gpio_amdpt button gpio_generic loop [last unloaded: netconsole]
4,1511,12767657207,-;------------[ cut here ]------------
4,1512,12767657207,-;WARNING: CPU: 4 PID: 16786 at kernel/softirq.c:362 __local_bh_enable_ip+0x43/0x70
4,1513,12767657212,-,ncfrag=0/986;Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic ecc netconsole uas usb_storage snd_seq_dummy snd_hrtimer snd_seq snd_seq_device iptable_filter xt_tcpudp ip_tables
x_tables hwmon_vid 8021q garp mrp stp llc ipv6 fuse rt73usb rt2x00usb rt2x00lib mac80211 hid_logitech cfg80211 joydev hid_generic usbhid hid amdgpu intel_rapl_msr iommu_v2 intel_rapl_common gpu_sched eeepc_wmi asus_wmi drm_ttm_helper
ttm platform_profile battery drm_kms_helper sparse_keymap edac_mce_amd rfkill drm kvm_amd snd_hda_codec_realtek video snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel agpgart snd_intel_dspcfg snd_intel_sdw_acpi
wmi_bmof snd_hda_codec evdev i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hwdep mfd_core snd_pcm r8169 irqbypass snd_timer realtek snd xhci_pci xhci_pci_renesas xhci_hcd mdio_devres crct10dif_pclmul
crc32_pclmul i2c_piix4 soundcore ccp libphy ghash_clmulni_intel i2c_co4,1513,12767657212,-,ncfrag=966/986;re rapl k10temp wmi
4,1514,12767657248,c; acpi_cpufreq gpio_amdpt button gpio_generic loop [last unloaded: netconsole]
4,1516,12767657254,-;Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018
4,1517,12767657255,-;RIP: __local_bh_enable_ip+0x43/0x70
4,1518,12767657257,-;Code: 01 35 61 1d f3 7d 65 8b 05 5a 1d f3 7d a9 00 ff ff 00 74 1a bf 01 00 00 00 e8 99 b5 02 00 65 8b 05 42 1d f3 7d 85 c0 74 25 c3 <0f> 0b eb cc 48 c7 c7 d9 53 42 83 e8 4d ec a6 00 65 66 8b 05 25 19
All code
========
0: 01 35 61 1d f3 7d add %esi,0x7df31d61(%rip) # 0x7df31d67
6: 65 8b 05 5a 1d f3 7d mov %gs:0x7df31d5a(%rip),%eax # 0x7df31d67
d: a9 00 ff ff 00 test $0xffff00,%eax
12: 74 1a je 0x2e
14: bf 01 00 00 00 mov $0x1,%edi
19: e8 99 b5 02 00 call 0x2b5b7
1e: 65 8b 05 42 1d f3 7d mov %gs:0x7df31d42(%rip),%eax # 0x7df31d67
25: 85 c0 test %eax,%eax
27: 74 25 je 0x4e
29: c3 ret
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb cc jmp 0xfffffffffffffffa
2e: 48 c7 c7 d9 53 42 83 mov $0xffffffff834253d9,%rdi
35: e8 4d ec a6 00 call 0xa6ec87
3a: 65 gs
3b: 66 data16
3c: 8b .byte 0x8b
3d: 05 .byte 0x5
3e: 25 .byte 0x25
3f: 19 .byte 0x19

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb cc jmp 0xffffffffffffffd0
4: 48 c7 c7 d9 53 42 83 mov $0xffffffff834253d9,%rdi
b: e8 4d ec a6 00 call 0xa6ec5d
10: 65 gs
11: 66 data16
12: 8b .byte 0x8b
13: 05 .byte 0x5
14: 25 .byte 0x25
15: 19 .byte 0x19
4,1519,12767657259,-;RSP: 0018:fffffe00000f69a0 EFLAGS: 00010006
4,1520,12767657260,-;RAX: 0000000080110203 RBX: ffff8b0e05bd2000 RCX: ffff8b0e05bd2000
4,1521,12767657261,-;RDX: ffff8b0e0ac28000 RSI: 0000000000000201 RDI: ffffffffc12f12c3
4,1522,12767657262,-;RBP: ffff8b0e0c977a30 R08: fffffe00000f69e8 R09: ffff8b0e0d085000
4,1523,12767657263,-;R10: ffff8b0e03234300 R11: 0000000000000fff R12: ffff8b0e0d0850d0
4,1524,12767657264,-;R13: fffffe00000f69e8 R14: ffff8b0e0ddfc980 R15: ffff8b0e0d085a58
4,1525,12767657265,-;FS: 00007f2a8bcbe640(0000) GS:ffff8b110ed00000(0000) knlGS:0000000000000000
4,1526,12767657266,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
----------

Thanks,
Michele Ballabio