On Tue, Aug 12, 2025 at 01:57:49PM +0800, Zihuan Zhang wrote:Indeed, freezing the filesystem can work.
Hi all,Freeze the filesystem before you start freezing kthreads? That should
We encountered an issue where the number of freeze retries increased due to
processes stuck in D state. The logs point to jbd2-related activity.
log1:
6616.650482] task:ThreadPoolForeg state:D stack:0 pid:262026
tgid:4065 ppid:2490 task_flags:0x400040 flags:0x00004004
[ 6616.650485] Call Trace:
[ 6616.650486] <TASK>
[ 6616.650489] __schedule+0x532/0xea0
[ 6616.650494] schedule+0x27/0x80
[ 6616.650496] jbd2_log_wait_commit+0xa6/0x120
[ 6616.650499] ? __pfx_autoremove_wake_function+0x10/0x10
[ 6616.650502] ext4_sync_file+0x1ba/0x380
[ 6616.650505] do_fsync+0x3b/0x80
log2:
[ 631.206315] jdb2_log_wait_log_commit completed (elapsed 0.002 seconds)
[ 631.215325] jdb2_log_wait_log_commit completed (elapsed 0.001 seconds)
[ 631.240704] jdb2_log_wait_log_commit completed (elapsed 0.386 seconds)
[ 631.262167] Filesystems sync: 0.424 seconds
[ 631.262821] Freezing user space processes
[ 631.263839] freeze round: 1, task to freeze: 852
[ 631.265128] freeze round: 2, task to freeze: 2
[ 631.267039] freeze round: 3, task to freeze: 2
[ 631.271176] freeze round: 4, task to freeze: 2
[ 631.279160] freeze round: 5, task to freeze: 2
[ 631.287152] freeze round: 6, task to freeze: 2
[ 631.295346] freeze round: 7, task to freeze: 2
[ 631.301747] freeze round: 8, task to freeze: 2
[ 631.309346] freeze round: 9, task to freeze: 2
[ 631.317353] freeze round: 10, task to freeze: 2
[ 631.325348] freeze round: 11, task to freeze: 2
[ 631.333353] freeze round: 12, task to freeze: 2
[ 631.341358] freeze round: 13, task to freeze: 2
[ 631.349357] freeze round: 14, task to freeze: 2
[ 631.357363] freeze round: 15, task to freeze: 2
[ 631.365361] freeze round: 16, task to freeze: 2
[ 631.373379] freeze round: 17, task to freeze: 2
[ 631.381366] freeze round: 18, task to freeze: 2
[ 631.389365] freeze round: 19, task to freeze: 2
[ 631.397371] freeze round: 20, task to freeze: 2
[ 631.405373] freeze round: 21, task to freeze: 2
[ 631.413373] freeze round: 22, task to freeze: 2
[ 631.421392] freeze round: 23, task to freeze: 1
[ 631.429948] freeze round: 24, task to freeze: 1
[ 631.438295] freeze round: 25, task to freeze: 1
[ 631.444546] jdb2_log_wait_log_commit completed (elapsed 0.249 seconds)
[ 631.446387] freeze round: 26, task to freeze: 0
[ 631.446390] Freezing user space processes completed (elapsed 0.183
seconds)
[ 631.446392] OOM killer disabled.
[ 631.446393] Freezing remaining freezable tasks
[ 631.446656] freeze round: 1, task to freeze: 4
[ 631.447976] freeze round: 2, task to freeze: 0
[ 631.447978] Freezing remaining freezable tasks completed (elapsed 0.001
seconds)
[ 631.447980] PM: suspend debug: Waiting for 1 second(s).
[ 632.450858] OOM killer enabled.
[ 632.450859] Restarting tasks: Starting
[ 632.453140] Restarting tasks: Done
[ 632.453173] random: crng reseeded on system resumption
[ 632.453370] PM: suspend exit
[ 632.462799] jdb2_log_wait_log_commit completed (elapsed 0.000 seconds)
[ 632.466114] jdb2_log_wait_log_commit completed (elapsed 0.001 seconds)
This is the reason:
[ 631.444546] jdb2_log_wait_log_commit completed (elapsed 0.249 seconds)
During freezing, user processes executing jbd2_log_wait_commit enter D state
because this function calls wait_event and can take tens of milliseconds to
complete. This long execution time, coupled with possible competition with
the freezer, causes repeated freeze retries.
While we understand that jbd2 is a freezable kernel thread, we would like to
know if there is a way to freeze it earlier or freeze some critical
processes proactively to reduce this contention.
quiesce the jbd2 workers and pause anyone trying to write to the fs.
Maybe the missing piece here is the device model not knowing how to callCurrently, suspend flow seem to does not invoke bdev_freeze(). Do you have any plans or insights on improving or integrating this functionality more smoothly into the device model and suspend sequence?
bdev_freeze prior to a suspend?
That said, I think that doesn't 100% work for XFS because it has
kworkers for metadata buffer read completions, and freezes don't affect
read operations...
(just my clueless 2c)
--D
Thanks for your input and suggestions.
在 2025/8/11 18:58, Michal Hocko 写道:
On Mon 11-08-25 17:13:43, Zihuan Zhang wrote:Thanks for the feedback. I understand your concern that changing the freezer
在 2025/8/8 16:58, Michal Hocko 写道:[...]
Unless there is a clear path for a more extendable interface thenAlso the interface seems to be really coarse grained and it can easilyWe recognize that the current interface is relatively coarse-grained and
turn out insufficient for other usecases while it is not entirely clear
to me how this could be extended for those.
may not be sufficient for all scenarios. The present implementation is a
basic version.
Our plan is to introduce a classification-based mechanism that assigns
different freeze priorities according to process categories. For example,
filesystem and graphics-related processes will be given higher default
freeze priority, as they are critical in the freezing workflow. This
classification approach helps target important processes more precisely.
However, this requires further testing and refinement before full
deployment. We believe this incremental, category-based design will make the
mechanism more effective and adaptable over time while keeping it
manageable.
introducing this one is a no-go. We do not want to grow different ways
to establish freezing policies.
But much more fundamentally. So far I haven't really seen any argument
why different priorities help with the underlying problem other than the
timing might be slightly different if you change the order of freezing.
This to me sounds like the proposed scheme mostly works around the
problem you are seeing and as such is not a really good candidate to be
merged as a long term solution. Not to mention with a user API that
needs to be maintained for ever.
So NAK from me on the interface.
priority order looks like working around the symptom rather than solving the
root cause.
Since the last discussion, we have analyzed the D-state processes further
and identified that the long wait time is caused by jbd2_log_wait_commit.
This wait happens because user tasks call into this function during
fsync/fdatasync and it can take tens of milliseconds to complete. When this
coincides with the freezer operation, the tasks are stuck in D state and
retried multiple times, increasing the total freeze time.
Although we know that jbd2 is a freezable kernel thread, we are exploring
whether freezing it earlier — or freezing certain key processes first —
could reduce this contention and improve freeze completion time.
We’ve pulled in the jbd2 maintainer to get feedback on whether changing thePlease work with maintainers of those subsystems to find properI believe it would be more useful to find sources of those freezerwe have already identified some causes of D-state tasks, many of which are
blockers and try to address those. Making more blocked tasks
__set_task_frozen compatible sounds like a general improvement in
itself.
related to the filesystem. On some systems, certain processes frequently
execute ext4_sync_file, and under contention this can lead to D-state tasks.
solutions.
freeze ordering for jbd2 is safe or if there’s a better approach to avoid
the repeated retries caused by this wait.