Re: [dm-devel] Processes hung in "D" state in ext4, mm, md and dmcrypt

From: Yu Kuai
Date: Wed Jul 26 2023 - 22:38:53 EST


Hi,

在 2023/07/26 18:02, David Howells 写道:
Hi,

With 6.5-rc2 (6.5.0-0.rc2.20230721gitf7e3a1bafdea.20.fc39.x86_64), I'm seeing
a bunch of processes getting stuck in the D state on my desktop after a few
hours of reading email and compiling stuff. It's happened every day this week
so far and I managed to grab stack traces of the stuck processes this morning
(see attached).

There are two blockdevs involved below, /dev/md2 and /dev/md3. md3 is a raid1
array with two partitions with an ext4 partition on it. md2 is similar but
it's dm-crypted and ext4 is on top of that.

David
---

1015 ? D 0:04 [md2_raid1]
1074 ? D 0:00 [jbd2/sda6-8]
1138 ? D 0:00 [jbd2/md3-8]
1167 ? D 0:10 [dmcrypt_write/253:0]
1202 ? D 0:03 [jbd2/dm-0-8]
117547 ? D 5:12 [kworker/u16:8+flush-9:3]
121540 ? D 0:46 [kworker/u16:10+flush-253:0]
125431 pts/2 Dl+ 0:00 emacs .stgit-edit.txt
125469 ? D 0:00 /usr/libexec/nmh/rcvstore +kernel

===1015===
PID TTY STAT TIME COMMAND
1015 ? D 0:04 [md2_raid1]
[<0>] md_super_wait+0xa2/0xe0
[<0>] md_bitmap_daemon_work+0x183/0x3b0
[<0>] md_check_recovery+0x42/0x5a0
[<0>] raid1d+0x87/0x16f0 [raid1]
[<0>] md_thread+0xab/0x190
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

This means either the io to write super_block is stuck in underlying
disks or writing super_block is broken, I think it's probably the
former case. You'll need to locate where this io is now. If it can
be sure that there is no io in underlying disks, then this problem
is related to raid.

===1074===
PID TTY STAT TIME COMMAND
1074 ? D 0:00 [jbd2/sda6-8]
[<0>] jbd2_journal_commit_transaction+0x11a6/0x1a20
[<0>] kjournald2+0xad/0x280
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===1138===
PID TTY STAT TIME COMMAND
1138 ? D 0:00 [jbd2/md3-8]
[<0>] jbd2_journal_commit_transaction+0x162d/0x1a20
[<0>] kjournald2+0xad/0x280
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===1167===
PID TTY STAT TIME COMMAND
1167 ? D 0:10 [dmcrypt_write/253:0]
[<0>] md_super_wait+0xa2/0xe0
[<0>] md_bitmap_unplug+0xad/0x120
[<0>] flush_bio_list+0xf3/0x100 [raid1]
[<0>] raid1_unplug+0x3b/0xb0 [raid1]
[<0>] __blk_flush_plug+0xd8/0x160
[<0>] blk_finish_plug+0x29/0x40
[<0>] dmcrypt_write+0x132/0x140 [dm_crypt]
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===1202===
PID TTY STAT TIME COMMAND
1202 ? D 0:03 [jbd2/dm-0-8]
[<0>] jbd2_journal_commit_transaction+0x162d/0x1a20
[<0>] kjournald2+0xad/0x280
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===117547===
PID TTY STAT TIME COMMAND
117547 ? D 5:12 [kworker/u16:8+flush-9:3]
[<0>] blk_mq_get_tag+0x11e/0x2b0

Is this one of raid underlying disks? If so, this looks like io is stuck
in underlying disks.

Thanks,
Kuai

[<0>] __blk_mq_alloc_requests+0x1bc/0x350
[<0>] blk_mq_submit_bio+0x2c7/0x680
[<0>] __submit_bio+0x8b/0x170
[<0>] submit_bio_noacct_nocheck+0x159/0x370
[<0>] __block_write_full_folio+0x1e1/0x400
[<0>] writepage_cb+0x1a/0x70
[<0>] write_cache_pages+0x144/0x3b0
[<0>] do_writepages+0x164/0x1e0
[<0>] __writeback_single_inode+0x3d/0x360
[<0>] writeback_sb_inodes+0x1ed/0x4b0
[<0>] __writeback_inodes_wb+0x4c/0xf0
[<0>] wb_writeback+0x298/0x310
[<0>] wb_workfn+0x35b/0x510
[<0>] process_one_work+0x1de/0x3f0
[<0>] worker_thread+0x51/0x390
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===121540===
PID TTY STAT TIME COMMAND
121540 ? D 0:46 [kworker/u16:10+flush-253:0]
[<0>] folio_wait_bit_common+0x13d/0x350
[<0>] mpage_prepare_extent_to_map+0x309/0x4d0
[<0>] ext4_do_writepages+0x25d/0xc90
[<0>] ext4_writepages+0xad/0x180
[<0>] do_writepages+0xcf/0x1e0
[<0>] __writeback_single_inode+0x3d/0x360
[<0>] writeback_sb_inodes+0x1ed/0x4b0
[<0>] __writeback_inodes_wb+0x4c/0xf0
[<0>] wb_writeback+0x298/0x310
[<0>] wb_workfn+0x35b/0x510
[<0>] process_one_work+0x1de/0x3f0
[<0>] worker_thread+0x51/0x390
[<0>] kthread+0xe5/0x120
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

===125431===
PID TTY STAT TIME COMMAND
125431 pts/2 Dl+ 0:00 emacs .stgit-edit.txt
[<0>] jbd2_log_wait_commit+0xd8/0x140
[<0>] ext4_sync_file+0x1cc/0x380
[<0>] __x64_sys_fsync+0x3b/0x70
[<0>] do_syscall_64+0x5d/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

===125469===
PID TTY STAT TIME COMMAND
125469 ? D 0:00 /usr/libexec/nmh/rcvstore +kernel
[<0>] folio_wait_bit_common+0x13d/0x350
[<0>] folio_wait_writeback+0x2c/0x90
[<0>] truncate_inode_partial_folio+0x5e/0x1a0
[<0>] truncate_inode_pages_range+0x1da/0x400
[<0>] truncate_pagecache+0x47/0x60
[<0>] ext4_setattr+0x685/0xba0
[<0>] notify_change+0x1e0/0x4a0
[<0>] do_truncate+0x98/0xf0
[<0>] do_sys_ftruncate+0x15c/0x1b0
[<0>] do_syscall_64+0x5d/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel

.