Re: [PATCH v2] cgroup/rstat: change cgroup_base_stat to atomic
From: tj@xxxxxxxxxx
Date: Fri Jul 04 2025 - 13:58:08 EST
Hello,
On Fri, Jul 04, 2025 at 01:13:56PM +0000, Wlodarczyk, Bertrand wrote:
...
> After 54 units both solutions have the same result.
> What's the issue here? Why user seeing A = 1, B = 0, C = 0 in 22 unit (instead of spin) is a bad thing in rstat scenario?
Because some stats are related to each other - e.g. in blkcg, BPS and IOPS.
Here, overlapping cputime stats if we ever add [soft]irq time breakdowns,
and that can lead to non-sensical calculations (divide by zero, underflow,
and so on) in its users, just rare enough to not debugged easily but
frequent enough to be a headache in larger / longer deployments. And,
because we can usually do better.
> > Can you please try a different approach?
>
> In last few days I've investigated this, have some success but nowhere
> near to the improvements yield by atomics use. For the reasons I mentioned
> above, locks approach is much more complex to optimize.
So, I'm not converting these stats to atomics. It's just not a good long
term direction. Please find a better solution. I'm pretty sure there are
multiple.
>> Yeah, I saw the benchmark but I was more curious what actual use case
>> would lead to behaviors like that because you'd have to hammer on those
>> stats really hard for this to be a problem. In most use cases that I'm
>> aware of, the polling frequencies of these stats are >= 1sec. I guess the
>> users in your use case were banging on them way harder, at least
>> previously.
>
> From what I know, the https://github.com/google/cadvisor instances
> deployed on the client machine hammered these stats. Sharing servers
> between independent teams or orgs in big corps is frequent. Every
> interested party deployed its own, or similar, instance. We can say just
> don't do that and be fine, but it will be happening anyway. It's better to
> just make rstats more robust.
I do think this is a valid use case. I just want to get some sense on the
numbers involved. Do you happen to know what frequency cAdvisor was polling
the stats at and how many instances were running? The numbers don't have to
be accurate. I just want to know the ballpark numbers.
Thanks.
--
tejun