Re: [PATCH v2] mm: emit tracepoint when RSS changes by threshold

From: Tom Zanussi
Date: Thu Sep 05 2019 - 16:32:38 EST


Hi,

On Thu, 2019-09-05 at 13:24 -0700, Daniel Colascione wrote:
> On Thu, Sep 5, 2019 at 12:56 PM Tom Zanussi <zanussi@xxxxxxxxxx>
> wrote:
> > On Thu, 2019-09-05 at 13:51 -0400, Joel Fernandes wrote:
> > > On Thu, Sep 05, 2019 at 01:47:05PM -0400, Joel Fernandes wrote:
> > > > On Thu, Sep 05, 2019 at 01:35:07PM -0400, Steven Rostedt wrote:
> > > > >
> > > > >
> > > > > [ Added Tom ]
> > > > >
> > > > > On Thu, 5 Sep 2019 09:03:01 -0700
> > > > > Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > > > >
> > > > > > On Thu, Sep 5, 2019 at 7:43 AM Michal Hocko <mhocko@kernel.
> > > > > > org>
> > > > > > wrote:
> > > > > > >
> > > > > > > [Add Steven]
> > > > > > >
> > > > > > > On Wed 04-09-19 12:28:08, Joel Fernandes wrote:
> > > > > > > > On Wed, Sep 4, 2019 at 11:38 AM Michal Hocko <mhocko@ke
> > > > > > > > rnel
> > > > > > > > .org> wrote:
> > > > > > > > >
> > > > > > > > > On Wed 04-09-19 11:32:58, Joel Fernandes wrote:
> > > > > > >
> > > > > > > [...]
> > > > > > > > > > but also for reducing
> > > > > > > > > > tracing noise. Flooding the traces makes it less
> > > > > > > > > > useful
> > > > > > > > > > for long traces and
> > > > > > > > > > post-processing of traces. IOW, the overhead
> > > > > > > > > > reduction
> > > > > > > > > > is a bonus.
> > > > > > > > >
> > > > > > > > > This is not really anything special for this
> > > > > > > > > tracepoint
> > > > > > > > > though.
> > > > > > > > > Basically any tracepoint in a hot path is in the same
> > > > > > > > > situation and I do
> > > > > > > > > not see a point why each of them should really invent
> > > > > > > > > its
> > > > > > > > > own way to
> > > > > > > > > throttle. Maybe there is some way to do that in the
> > > > > > > > > tracing subsystem
> > > > > > > > > directly.
> > > > > > > >
> > > > > > > > I am not sure if there is a way to do this easily. Add
> > > > > > > > to
> > > > > > > > that, the fact that
> > > > > > > > you still have to call into trace events. Why call into
> > > > > > > > it
> > > > > > > > at all, if you can
> > > > > > > > filter in advance and have a sane filtering default?
> > > > > > > >
> > > > > > > > The bigger improvement with the threshold is the number
> > > > > > > > of
> > > > > > > > trace records are
> > > > > > > > almost halved by using a threshold. The number of
> > > > > > > > records
> > > > > > > > went from 4.6K to
> > > > > > > > 2.6K.
> > > > > > >
> > > > > > > Steven, would it be feasible to add a generic tracepoint
> > > > > > > throttling?
> > > > > >
> > > > > > I might misunderstand this but is the issue here actually
> > > > > > throttling
> > > > > > of the sheer number of trace records or tracing large
> > > > > > enough
> > > > > > changes
> > > > > > to RSS that user might care about? Small changes happen all
> > > > > > the
> > > > > > time
> > > > > > but we are likely not interested in those. Surely we could
> > > > > > postprocess
> > > > > > the traces to extract changes large enough to be
> > > > > > interesting
> > > > > > but why
> > > > > > capture uninteresting information in the first place? IOW
> > > > > > the
> > > > > > throttling here should be based not on the time between
> > > > > > traces
> > > > > > but on
> > > > > > the amount of change of the traced signal. Maybe a generic
> > > > > > facility
> > > > > > like that would be a good idea?
> > > > >
> > > > > You mean like add a trigger (or filter) that only traces if a
> > > > > field has
> > > > > changed since the last time the trace was hit? Hmm, I think
> > > > > we
> > > > > could
> > > > > possibly do that. Perhaps even now with histogram triggers?
> > > >
> > > >
> > > > Hey Steve,
> > > >
> > > > Something like an analog to digitial coversion function where
> > > > you
> > > > lose the
> > > > granularity of the signal depending on how much trace data:
> > > > https://www.globalspec.com/ImageRepository/LearnMore/20142/9ee3
> > > > 8d1a
> > > > 85d37fa23f86a14d3a9776ff67b0ec0f3b.gif
> > >
> > > s/how much trace data/what the resolution is/
> > >
> > > > so like, if you had a counter incrementing with values after
> > > > the
> > > > increments
> > > > as: 1,3,4,8,12,14,30 and say 5 is the threshold at which to
> > > > emit a
> > > > trace,
> > > > then you would get 1,8,12,30.
> > > >
> > > > So I guess what is need is a way to reduce the quantiy of trace
> > > > data this
> > > > way. For this usecase, the user mostly cares about spikes in
> > > > the
> > > > counter
> > > > changing that accurate values of the different points.
> > >
> > > s/that accurate/than accurate/
> > >
> > > I think Tim, Suren, Dan and Michal are all saying the same thing
> > > as
> > > well.
> > >
> >
> > There's not a way to do this using existing triggers (histogram
> > triggers have an onchange() that fires on any change, but that
> > doesn't
> > help here), and I wouldn't expect there to be - these sound like
> > very
> > specific cases that would never have support in the simple trigger
> > 'language'.
>
> I don't see the filtering under discussion as some "very specific"
> esoteric need. You need this general kind of mechanism any time you
> want to monitor at low frequency a thing that changes at high
> frequency. The general pattern isn't specific to RSS or even memory
> in
> general. One might imagine, say, wanting to trace large changes in
> TCP
> window sizes. Any time something in the kernel has a "level" and that
> level changes at high frequency and we want to learn about big swings
> in that level, the mechanism we're talking about becomes useful. I
> don't think it should be out of bounds for the histogram mechanism,
> which is *almost* there right now. We already have the ability to
> accumulate values derived from ftrace events into tables keyed on
> various fields in these events and things like onmax().
>
> > On the other hand, I have been working on something that should
> > give
> > you the ability to do something like this, by writing a module that
> > hooks into arbitrary trace events, accessing their fields, building
> > up
> > any needed state across events, and then generating synthetic
> > events as
> > needed:
>
> You might as well say we shouldn't have tracepoints at all and that
> people should just write modules that kprobe what they need. :-) You
> can reject *any* kernel interface by suggesting that people write a
> module to do that thing. (You could also probably do something with
> eBPF.) But there's a lot of value to having an easy-to-use
> general-purpose mechanism that doesn't make people break out the
> kernel headers and a C compiler.

Oh, I didn't mean to reject any interface - I guess I should go read
the whole thread then, and find the interface you're talking about.

Tom