Re: [RFC PATCH] PM: Optionally block user fork during freeze to improve performance

From: Zihuan Zhang
Date: Thu Jun 12 2025 - 22:39:20 EST

Next message: Bagas Sanjaya: "[PATCH] Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublists"
Previous message: Tao Chen: "Re: [PATCH bpf-next] bpf: clear user buf when bpf_d_path failed"
In reply to: David Hildenbrand: "Re: [RFC PATCH] PM: Optionally block user fork during freeze to improve performance"
Next in thread: Michal Hocko: "Re: [RFC PATCH] PM: Optionally block user fork during freeze to improve performance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi David,
Thanks for your advice!

在 2025/6/10 18:50, David Hildenbrand 写道:

　　　　　　　　　　　　　　　　　　
Can't this problem be mitigated by simply not scheduling the new fork'ed
process while the system is frozen?

Or what exact scenario are you worried about?

Let me revisit the core issue for clarity. Under normal conditions, most processes in the system are in a sleep state, and only a few are runnable. So even with thousands of processes, the freezer generally works reliably and completes within a reasonable time
However, in our fork-based test scenario, we observed repeated freeze retries. This is not due to process count directly, but rather due to a scheduling behavior during the freeze phase. Specifically, the freezer logic contains the following snippet:
Here is the relevant freezer code that introduces the yield:
* We need to retry, but first give the freezing tasks some * time to enter the refrigerator. Start with an initial * 1 ms sleep followed by exponential backoff until 8 ms. */ usleep_range(sleep_usecs / 2, sleep_usecs); if (sleep_usecs < 8 * USEC_PER_MSEC) sleep_usecs *= 2;
This mechanism is usually effective because most tasks are sleeping and quickly enter the frozen state. But with concurrent fork() bombs, we observed that this CPU relinquish gives new child processes a chance to run, delaying or blocking the freezer's progress.
When only a single fork loop is running, it’s often frozen before the next retry. But when multiple forkers compete for CPU, we observed an increase in the todo count and repeated retries.
So while preventing the scheduling of newly forked processes would solve the problem at its root, it would require deeper architectural changes (e.g., task-level flags or restrictions at the scheduler level).
We initially considered whether replacing usleep_range() with a non-yielding wait might reduce this contention window. However, this approach turned out to be counterproductive — it starves other normal user tasks that need CPU time to reach their try_to_freeze() checkpoint, ultimately making the freeze process slower .
You’re right — blocking fork() is quite intrusive, so it’s worth exploring alternatives. We’ll try implementing your idea of preventing the newly forked process from being scheduled while the system is freezing, rather than failing the fork() call outright.

This may allow us to maintain compatibility with existing userspace while avoiding interference with the freezer traversal. We’ll evaluate whether this approach can reliably mitigate the issue (especially the scheduling race window between copy_process() and freeze_task()), and report back with results.

Best regards,
Zihuan Zhang

Next message: Bagas Sanjaya: "[PATCH] Documentation: ublk: Separate UBLK_F_AUTO_BUF_REG fallback behavior sublists"
Previous message: Tao Chen: "Re: [PATCH bpf-next] bpf: clear user buf when bpf_d_path failed"
In reply to: David Hildenbrand: "Re: [RFC PATCH] PM: Optionally block user fork during freeze to improve performance"
Next in thread: Michal Hocko: "Re: [RFC PATCH] PM: Optionally block user fork during freeze to improve performance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]