Re: [RFC PATCH] PM: Optionally block user fork during freeze to improve performance

From: Zihuan Zhang
Date: Sun Jun 15 2025 - 23:47:03 EST


Hi  Michal,

Thanks for the question.

在 2025/6/13 15:05, Michal Hocko 写道:
On Fri 13-06-25 10:37:42, Zihuan Zhang wrote:
Hi David,
Thanks for your advice!

在 2025/6/10 18:50, David Hildenbrand 写道:
                               
Can't this problem be mitigated by simply not scheduling the new fork'ed
process while the system is frozen?

Or what exact scenario are you worried about?
Let me revisit the core issue for clarity. Under normal conditions, most
processes in the system are in a sleep state, and only a few are runnable.
So even with thousands of processes, the freezer generally works reliably
and completes within a reasonable time
How do you define reasonable time?


To clarify: freezing a process typically takes only a few dozen microseconds. In our tests, the freezer includes a usleep_range() delay between retries, which is about 1ms in the first round and doubles in subsequent rounds. Despite this delay, we observed that around 10% of the processes were not frozen during the first pass and had to be retried.

This suggests that even with a reasonably sufficient delay, some newly forked processes do not get frozen in time during the first iteration, simply due to timing. The freeze latency itself remains small, but not all processes are caught on the first try.
However, in our fork-based test scenario, we observed repeated freeze
retries.
Does this represent any real life scenario that happens on your system?
In other words how often do you miss your "reasonable time" treshold
while running a regular workload. Does the freezer ever fail?

[...]
In our test scenario, although new processes can indeed be created during the usleep_range() intervals between freeze iterations, it’s actually difficult to make the freezer fail outright. This is because user processes are forcibly frozen: when they return to user space and check for pending signals, they enter try_to_freeze() and transition into the refrigerator.

However, since the scheduler is fair by design, it gives both newly forked tasks and yet-to-be-frozen tasks a chance to run. This competition for CPU time can slightly delay the overall freeze process. While this typically doesn’t lead to failure, it does cause more retries than necessary, especially under CPU pressure.

Given that freezing is a clearly defined and semantically critical state transition, we believe it makes sense to prioritize the execution of tasks that are pending freezing over newly forked ones—particularly in resource-constrained environments
You’re right — blocking fork() is quite intrusive, so it’s worth exploring
alternatives. We’ll try implementing your idea of preventing the newly
forked process from being scheduled while the system is freezing, rather
than failing the fork() call outright.
Just curious, are you interested in global freezer only or is the cgroup
freezer involved as well?

At this stage, our focus is mainly on the global freezer during system suspend and hibernate (S3/S4). However, the patch itself is based on the generic freezing() and freeze_task() logic, so it should also work with the cgroup freezer as well.