On Thu, Jul 03, 2025 at 09:38:08PM +0800, Chen, Yu C wrote:
Hi Peter,
On 7/3/2025 8:36 PM, Peter Zijlstra wrote:
On Thu, Jul 03, 2025 at 05:20:47AM -0700, Libo Chen wrote:
I agree. The other parts, schedstat and vmstat, are still quite helpful.
Also tracepoints are more expensive than counters once enabled, I think
that's too much for just counting numbers.
I'm not generally a fan of eBPF, but supposedly it is really good for
stuff like this.
Attaching to a tracepoint and distributing into cgroup buckets seems
like it should be a trivial script.
Yes, it is feasible to use eBPF. On the other hand, if some
existing monitoring programs rely on /proc/{pid}/sched to observe
the NUMA balancing metrics of processes, it might be helpful to
include the NUMA migration/swap information in /proc/{pid}/sched.
This approach can minimize the modifications needed for these
monitoring programs, eliminating the need to add a new BPF script
to obtain NUMA balancing statistics from different sources IMHO.
Maybe...
The thing is, most of the time the effort spend on collecting all these
numbers is wasted energy since nobody ever looks at them.
Sometimes we're stuck with ABI, like the proc files you mentioned. We> Ideally I would strip out all the statistics and accounting crap and> make sure we have tracepoints (not trace-events) covering all the needed
can't readily remove them, stuff would break. But does that mean we
should endlessly add to them just because convenient?
spots, and then maybe just maybe have a few kernel modules that hook
into those tracepoints to provide the legacy interfaces.
That way, only the people that care get to pay the overhead of actually
collecting the numbers.
One can dream I suppose... :-)