Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer

From: Tiffany Yang
Date: Tue Jul 22 2025 - 18:16:35 EST


Michal Koutný <mkoutny@xxxxxxxx> writes:

I'd like to incorporate the reason from your other mail:
| Since there isn't yet a clear way to identify a set of "lost" time
| that everyone (or at least a wider group of users) cares about, it
| seems like iterating over components of interest is the best way
into this commit message (because that's a stronger ponit that your use
case alone).


Any feedback would be much appreciated!

I can see benefits of this new stat field conceptually, I have some
remarks to implementation and suggestions to conventions below.

--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
it's possible to delete a frozen (and empty) cgroup, as well as
create new sub-cgroups.

+ cgroup.freeze.stat

With the given implementation (and use scenario), this'd better exposed
in
cgroup.freeze.stat.local

I grok the hierarchical summing would make little sense and it'd make
implementaion more complex. With that I'm thinking about formulation:

Cumulative time that cgroup has spent between freezing and
thawing, regardless of whether by self or ancestor cgroups. NB
(not) reaching "frozen" state is not accounted here.

+ A read-only flat-keyed file which exists in non-root cgroups.
+ The following entry is defined:
+
+ freeze_time_total_ns
+ Cumulative time that this cgroup has spent in the freezing
+ state, regardless of whether or not it reaches "frozen".
+

Rather use microseconds, it's the cgroup API convention and I'm not
sure nanosecods exposed here are the needed precision.


Ack.

1 _____
frozen 0 __/ \__
ab cd

Yeah, I find the mesurent between a and c the sanest.


+static int cgroup_freeze_stat_show(struct seq_file *seq, void *v)
+{
+ struct cgroup *cgrp = seq_css(seq)->cgroup;
+ u64 freeze_time = 0;
+
+ spin_lock_irq(&css_set_lock);
+ if (test_bit(CGRP_FREEZE, &cgrp->flags))
+ freeze_time = ktime_get_ns() - cgrp->freezer.freeze_time_start_ns;
+
+ freeze_time += cgrp->freezer.freeze_time_total_ns;
+ spin_unlock_irq(&css_set_lock);

I don't like taking this spinlock only for the matter of reading this
attribute. The intention should be to keep the (un)freezeing mostly
unaffected at the expense of these readers (seqcount or u64 stats?).


Ah, thank you for this suggestion! I noticed that none of the other
seq_file read implementations took a lock, so I thought this might be a
point of contention. I'll try a seqlock in the next version of the
patch.

Alternative approach: either there's outer watcher who can be notified
by cgroup.events:frozen or it's an inner watcher who couldn't actively
read the field anyway. So the field could only show completed
freeze/thaw cycles from the past (i.e. not substitute clock_gettime(2)
when the cgroup is frozen), which could simplify querying the flag too.


This is a good observation. This approach does simplify things, but
even though it would work for our use case, I feel like this value
would be less useful for the outer watcher case, especially in the case
where the cgroup never reaches the frozen state.

@@ -5758,6 +5780,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
* if the parent has to be frozen, the child has too.
*/
cgrp->freezer.e_freeze = parent->freezer.e_freeze;
+ cgrp->freezer.freeze_time_total_ns = 0;

struct cgroup is kzalloc'd, this is unnecessary

Thank you for all your feedback! I'll make sure to incorporate these
suggestions into the next version.

--
Tiffany Y. Yang