Re: [RFC 1/1] mm: Add per-task struct tlb counters

From: Dave Hansen
Date: Wed Sep 14 2022 - 03:41:17 EST


On 9/13/22 18:51, Joe Damato wrote:
> TLB shootdowns are tracked globally, but on a busy system it can be
> difficult to disambiguate the source of TLB shootdowns.
>
> Add two counter fields:
> - nrtlbflush: number of tlb flush events received
> - ngtlbflush: number of tlb flush events generated
>
> Expose those fields in /proc/[pid]/stat so that they can be analyzed
> alongside similar metrics (e.g. min_flt and maj_flt).

On x86 at least, we already have two other ways to count flushes. You
even quoted them with your patch:

> count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
> + current->ngtlbflush++;
> if (info->end == TLB_FLUSH_ALL)
> trace_tlb_flush(TLB_REMOTE_SEND_IPI, TLB_FLUSH_ALL);

Granted, the count_vm_tlb...() one is debugging only. But, did you try
to use those other mechanisms? For instance, could you patch
count_vm_tlb_event()? Why didn't the tracepoints work for you?

Can this be done in a more arch-generic way? It's a shame to
unconditionally add counters to the task struct and only use them on
x86. If someone wanted to generalize the x86 tracepoints, or make them
available to other architectures, I think that would be fine even if
they have to change a bit (queue the inevitable argument about
tracepoint ABI).

P.S. I'm not a fan of the structure member naming.