On Thu, Jul 17, 2025 at 3:02 AM Zihuan Zhang <zhangzihuan@xxxxxxxxxx> wrote:
HI Rafael,Peter doesn't seem to be convinced that this is the case.
在 2025/7/16 20:26, Rafael J. Wysocki 写道:
Hi,Currently, the general consensus from the discussion is that skipping
On Wed, Jul 16, 2025 at 8:26 AM Zihuan Zhang <zhangzihuan@xxxxxxxxxx> wrote:
Hi all,I think that the discussion with Peter regarding this has not been concluded.
This patch series improves the performance of the process freezer by
skipping zombie tasks during freezing.
In the suspend and hibernation paths, the freezer traverses all tasks
and attempts to freeze them. However, zombie tasks (EXIT_ZOMBIE with
PF_EXITING) are already dead — they are not schedulable and cannot enter
the refrigerator. Attempting to freeze such tasks is redundant and
unnecessarily increases freezing time.
In particular, on systems under fork storm conditions (e.g., many
short-lived processes quickly becoming zombies), the number of zombie tasks
can spike into the thousands or more. We observed that this causes the
freezer loop to waste significant time processing tasks that are guaranteed
to not need freezing.
I thought that there was an alternative patch proposed during that
discussion. If I'm not mistaken about this, what happened to that
patch?
Thanks!
zombie or dead tasks can help reduce locking overhead during freezing.
The remaining question is how best to implement that.There is the counter argument of special-casing of p->exit_state and
Peter suggested skipping all tasks with PF_NOFREEZE, which would make
the logic more general and cover all cases. However, as Oleg pointed
out, the current implementation based on PF_NOFREEZE might be problematic.
My current thought is that exit_state already reliably covers all
exiting user processes, and it’s a good fit for skipping user-space
tasks. For the kernel side, we may safely skip a few kernel threads like
kthreadd that set PF_NOFREEZE and never change it — we can consider
refining this further in the future.
the relatively weak justification for it.
You have created a synthetic workload where it matters, but how likely
is it to be the case in practice?