Re: + page-owner-tracking.patch added to -mm tree

From: Ingo Molnar
Date: Wed Apr 01 2009 - 09:22:57 EST



* Mel Gorman <mel@xxxxxxxxx> wrote:

> On Wed, Apr 01, 2009 at 01:15:40PM +0200, Ingo Molnar wrote:
> > > <VAST AMOUNTS OF SNIPPAGE>
> > >
> > > +static inline void __stack_trace(struct page *page, unsigned long *stack,
> > > + unsigned long bp)
> > > +{
> > > + int i = 0;
> > > + unsigned long addr;
> > > + struct thread_info *tinfo = (struct thread_info *)
> > > + ((unsigned long)stack & (~(THREAD_SIZE - 1)));
> > > +
> > > + memset(page->trace, 0, sizeof(long) * 8);
> > > +
> > > +#ifdef CONFIG_FRAME_POINTER
> > > + if (bp) {
> > > + while (valid_stack_ptr(tinfo, (void *)bp)) {
> > > + addr = *(unsigned long *)(bp + sizeof(long));
> > > + page->trace[i] = addr;
> > > + if (++i >= 8)
> > > + break;
> > > + bp = *(unsigned long *)bp;
> > > + }
> > > + return;
> > > + }
> > > +#endif /* CONFIG_FRAME_POINTER */
> > > + while (valid_stack_ptr(tinfo, stack)) {
> > > + addr = *stack++;
> > > + if (__kernel_text_address(addr)) {
> > > + page->trace[i] = addr;
> > > + if (++i >= 8)
> > > + break;
> > > + }
> > > + }
> > > +}
> >
> > Uhm, this is not acceptable and broken, we have generic stacktrace
> > saving facilities for this precise purpose. It has other problems
> > too.
> >
> > The purpose of the patch seems genuinely useful and i support the
> > concept, but the current design is limiting (we could do much
> > better) and the implementation is awful. Please.
> >
> > Has this patch been reviewed by or Cc:-ed to anyone versed in kernel
> > instrumentation details, before it was applied to -mm? Those folks
> > hang around the tracing tree usually so they are easy to find. :)
> >
>
> This patch is ancient, predating most of the instrumentation stuff
> by years. It was dropped from -mm a few months ago because of
> changes in proc and this is a rebase because it came up as being
> potentially useful pinning down memory leaks when they occur.
>
> I'm not sure when exactly it was introduced to -mm, but I see
> references going back as far as 2.6.12-rc1 so it's no surprise
> this is now extremly odd looking. However, there is no plan to
> merge this to mainline making the effort of redoing it from
> scratch a questionable expenditure of time.

Hm, why not merge the concept upstream?

I'm sure the kmemtrace maintainers (Eduardo and Pekka) would be
interested in this too: they are already tracking slab allocation
events in a very finegrained way, extending that scheme to the page
allocator seems genuinely useful to me.

And that way the rather ugly bloating of struct page by this patch
would be solved too: it's not needed and there's no need to actually
save the callchain permanently, it's usually enough to _trace_ it -
user-space can then save it into a permanent map if it's interested.

So this feature could go into Linux distros, available by default
and integrated into the user-space side of kmemtrace.

( You could trace the gfp-type like kmemtrace does already - that
way user-space can do GFP_MOVABLE / GFP_RECLAIMABLE summary counts
(and more). )

So hooking up this info into tracepoints and ftrace seems genuinely
useful to me. You could trace the whole system, or you could use
filter expressions to do things like:

echo "comm == Xorg" > /debug/tracing/events/mm/page_alloc/filter

To only track the memory allocations of Xorg task(s) for example.

Later on a tracepoint based sw event could be made out of it too,
hooked up to sw-perfcounters, to dynamically profile page usage by
owning function(s) and task(s).

Later on we could add more info to the tracepoints - for example the
d_path() of the vma that is attached to the page could be traced (if
it's not an anonymous vma and not a kernel-internal allocation).

etc.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/