Re: [PATCH 1/2] cgroup: add cpu.stat_percpu

From: Peter Zijlstra
Date: Wed Jan 12 2022 - 03:30:51 EST


On Tue, Jan 11, 2022 at 03:38:20PM -0800, Josh Don wrote:
> On Tue, Jan 11, 2022 at 4:50 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Jan 07, 2022 at 03:41:37PM -0800, Josh Don wrote:
> >
> > > + seq_puts(seq, "usage_usec");
> > > + for_each_possible_cpu(cpu) {
> > > + cached_bstat = per_cpu_ptr(&cached_percpu_stats, cpu);
> > > + val = cached_bstat->cputime.sum_exec_runtime;
> > > + do_div(val, NSEC_PER_USEC);
> > > + seq_printf(seq, " %llu", val);
> > > + }
> > > + seq_puts(seq, "\n");
> > > +
> > > + seq_puts(seq, "user_usec");
> > > + for_each_possible_cpu(cpu) {
> > > + cached_bstat = per_cpu_ptr(&cached_percpu_stats, cpu);
> > > + val = cached_bstat->cputime.utime;
> > > + do_div(val, NSEC_PER_USEC);
> > > + seq_printf(seq, " %llu", val);
> > > + }
> > > + seq_puts(seq, "\n");
> > > +
> > > + seq_puts(seq, "system_usec");
> > > + for_each_possible_cpu(cpu) {
> > > + cached_bstat = per_cpu_ptr(&cached_percpu_stats, cpu);
> > > + val = cached_bstat->cputime.stime;
> > > + do_div(val, NSEC_PER_USEC);
> > > + seq_printf(seq, " %llu", val);
> > > + }
> > > + seq_puts(seq, "\n");
> >
> > This is an anti-pattern; given enough CPUs (easy) this will trivially
> > overflow the 1 page seq buffer.
> >
> > People are already struggling to fix existing ABI, lets not make the
> > problem worse.
>
> Is the concern there just the extra overhead from making multiple
> trips into this handler and re-allocating the buffer until it is large
> enough to take all the output? In that case, we could pre-allocate
> with a size of the right order of magnitude, similar to /proc/stat.
>
> Lack of per-cpu stats is a gap between cgroup v1 and v2, for which v2
> can easily support this interface given that it already tracks the
> stats percpu internally. I opted to dump them all in a single file
> here, to match the consolidation that occurred from cpuacct->cpu.stat.

Hmm.. fancy new stuff there :-) Yes, I think that would aleviate the
immediate problem. I suppose /proc/interrupts ought to get some of that
too.

Still, I'm not sure having so much data in a single file is wise. But
I've not really kept up with the discussions around this problem much.