6.9/BUG: Bad page state in process kswapd0 pfn:d6e840

From: Mikhail Gavrilov
Date: Mon Mar 18 2024 - 05:55:41 EST


Hi,
Today I saw for the first time "BUG: Bad page state in process
kswapd0 pfn:d6e840"

Trace:
BUG: Bad page state in process kswapd0 pfn:d6e840
page: refcount:0 mapcount:0 mapping:000000007512f4f2 index:0x2796c2c7c
pfn:0xd6e840
aops:btree_aops ino:1
flags: 0x17ffffe0000008(uptodate|node=0|zone=2|lastcpupid=0x3fffff)
page_type: 0xffffffff()
raw: 0017ffffe0000008 dead000000000100 dead000000000122 ffff88826d0be4c0
raw: 00000002796c2c7c 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: non-NULL mapping
Modules linked in: uvcvideo uvc videobuf2_vmalloc videobuf2_memops
videobuf2_v4l2 videobuf2_common videodev rndis_host uas cdc_ether
usbnet usb_storage mii overlay tun uinput snd_seq_dummy snd_hrtimer
rfcomm nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr bnep sunrpc
binfmt_misc snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi mc
amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi mt76x2u
mt7921e edac_mce_amd snd_hda_intel mt76x2_common mt7921_common
snd_intel_dspcfg mt76x02_usb snd_intel_sdw_acpi mt76_usb mt792x_lib
snd_hda_codec mt76x02_lib mt76_connac_lib btusb btrtl mt76
snd_hda_core btintel kvm_amd btbcm btmtk snd_hwdep mac80211 snd_seq
kvm vfat snd_seq_device bluetooth libarc4 fat irqbypass snd_pcm rapl
cfg80211 snd_timer wmi_bmof pcspkr snd i2c_piix4 k10temp rfkill
soundcore joydev
apple_mfi_fastcharge gpio_amdpt gpio_generic loop nfnetlink zram
amdgpu hid_apple crct10dif_pclmul crc32_pclmul crc32c_intel
polyval_clmulni polyval_generic amdxcp i2c_algo_bit drm_ttm_helper ttm
ghash_clmulni_intel drm_exec gpu_sched drm_suballoc_helper
sha512_ssse3 nvme drm_buddy sha256_ssse3 sha1_ssse3 drm_display_helper
nvme_core sp5100_tco r8169 ccp cec realtek nvme_auth video wmi
ip6_tables ip_tables fuse
CPU: 17 PID: 268 Comm: kswapd0 Tainted: G W L -------
--- 6.9.0-0.rc0.20240315gite5eb28f6d1af.8.fc41.x86_64+debug #1
Hardware name: Micro-Star International Co., Ltd. MS-7D73/MPG B650I
EDGE WIFI (MS-7D73), BIOS 1.82 01/24/2024
Call Trace:
<TASK>
dump_stack_lvl+0xce/0xf0
bad_page+0xd4/0x230
? __pfx_bad_page+0x10/0x10
? page_bad_reason+0x9d/0x1f0
free_unref_page_prepare+0x80e/0xe00
? __pfx___mem_cgroup_uncharge_folios+0x10/0x10
? __pfx_lock_release+0x10/0x10
free_unref_folios+0x26e/0x9c0
? _raw_spin_unlock_irq+0x28/0x60
move_folios_to_lru+0xc0e/0xe80
? __pfx_move_folios_to_lru+0x10/0x10
evict_folios+0xe5c/0x1610
? evict_folios+0x5f3/0x1610
? __pfx_lock_acquire+0x10/0x10
? __pfx_evict_folios+0x10/0x10
? rcu_is_watching+0x15/0xb0
? rcu_is_watching+0x15/0xb0
? __pfx_lock_acquire+0x10/0x10
? __pfx___might_resched+0x10/0x10
? mem_cgroup_get_nr_swap_pages+0x25/0x120
try_to_shrink_lruvec+0x4d8/0x800
? rcu_is_watching+0x15/0xb0
? __pfx_try_to_shrink_lruvec+0x10/0x10
? lock_release+0x581/0xc60
? __pfx_lock_release+0x10/0x10
shrink_one+0x37c/0x6f0
shrink_node+0x1d60/0x3080
? shrink_node+0x1d47/0x3080
? shrink_node+0x1afa/0x3080
? __pfx_shrink_node+0x10/0x10
? pgdat_balanced+0x7b/0x1a0
balance_pgdat+0x88b/0x1480
? rcu_is_watching+0x15/0xb0
? __pfx_balance_pgdat+0x10/0x10
? __switch_to+0x409/0xdd0
? __switch_to_asm+0x37/0x70
? __schedule+0x10cd/0x61d0
? __pfx_debug_object_free+0x10/0x10
? __try_to_del_timer_sync+0xe5/0x140
? __pfx_lock_release+0x10/0x10
? __pfx___might_resched+0x10/0x10
? set_pgdat_percpu_threshold+0x1c4/0x2f0
? __pfx_calculate_pressure_threshold+0x10/0x10
kswapd+0x51d/0x910
? __pfx_kswapd+0x10/0x10
? __pfx_autoremove_wake_function+0x10/0x10
? lockdep_hardirqs_on+0x80/0x110
? __kthread_parkme+0xba/0x1f0
? __pfx_kswapd+0x10/0x10
kthread+0x2ed/0x3c0
? _raw_spin_unlock_irq+0x28/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>

Quick googling doesn't give a reassuring answer.
If it is really a hardware problem then it is unclear what is the culprit here.
The memory was checked a year ago by testmem86 and no errors were found.
Considering the absolute randomness of the appearance of this bug
message, it may be worth ignoring it, but an unpleasant aftertaste
remains.

Machine spec: https://linux-hardware.org/?probe=24b7696f8a
I attached below full kernel log and build config.

--
Best Regards,
Mike Gavrilov.

Attachment: dmesg-BUG-Bad-page-state-in-process-kswapd0.zip
Description: Zip archive

Attachment: config-6.9.0-0.rc0.20240315gite5eb28f6d1af.8.fc41.zip
Description: Zip archive