[PATCH 2/2] cgroup: explain the race between updater and flusher

From: Shakeel Butt
Date: Thu Jul 03 2025 - 16:01:06 EST


Currently the rstat updater and the flusher can race and cause a
scenario where the stats updater skips adding the css to the lockless
list but the flusher might not see those updates done by the skipped
updater. This is benign race and the subsequent flusher will flush those
stats and at the moment there aren't any rstat users which are not fine
with this kind of race. However some future user might want more
stricter guarantee, so let's add appropriate comments and data_race()
tags to ease the job of future users.

Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
---
kernel/cgroup/rstat.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index c8a48cf83878..b98c03b1af25 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -60,6 +60,12 @@ static inline struct llist_head *ss_lhead_cpu(struct cgroup_subsys *ss, int cpu)
* Atomically inserts the css in the ss's llist for the given cpu. This is
* reentrant safe i.e. safe against softirq, hardirq and nmi. The ss's llist
* will be processed at the flush time to create the update tree.
+ *
+ * NOTE: if the user needs the guarantee that the updater either add itself in
+ * the lockless list or the concurrent flusher flushes its updated stats, a
+ * memory barrier is needed before the call to css_rstat_updated() i.e. a
+ * barrier after updating the per-cpu stats and before calling
+ * css_rstat_updated().
*/
__bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
{
@@ -86,8 +92,13 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
return;

rstatc = css_rstat_cpu(css, cpu);
- /* If already on list return. */
- if (llist_on_list(&rstatc->lnode))
+ /*
+ * If already on list return. This check is racy and smp_mb() is needed
+ * to pair it with the smp_mb() in css_process_update_tree() if the
+ * guarantee that the updated stats are visible to concurrent flusher is
+ * needed.
+ */
+ if (data_race(llist_on_list(&rstatc->lnode)))
return;

/*
@@ -145,9 +156,24 @@ static void css_process_update_tree(struct cgroup_subsys *ss, int cpu)
struct llist_head *lhead = ss_lhead_cpu(ss, cpu);
struct llist_node *lnode;

- while ((lnode = llist_del_first_init(lhead))) {
+ while ((lnode = data_race(llist_del_first_init(lhead)))) {
struct css_rstat_cpu *rstatc;

+ /*
+ * smp_mb() is needed here (more specifically in between
+ * init_llist_node() and per-cpu stats flushing) if the
+ * guarantee is required by a rstat user where etiher the
+ * updater should add itself on the lockless list or the
+ * flusher flush the stats updated by the updater who have
+ * observed that they are already on the list. The
+ * corresponding barrier pair for this one should be before
+ * css_rstat_updated() by the user.
+ *
+ * For now, there aren't any such user, so not adding the
+ * barrier here but if such a use-case arise, please add
+ * smp_mb() here.
+ */
+
rstatc = container_of(lnode, struct css_rstat_cpu, lnode);
__css_process_update_tree(rstatc->owner, cpu);
}
--
2.47.1