Re: [RFC PATCH 1/6] perf: Move mlock accounting to ring buffer allocation

From: Peter Zijlstra
Date: Fri Sep 23 2016 - 16:28:42 EST


On Fri, Sep 23, 2016 at 10:26:15AM -0700, Andi Kleen wrote:
> > Afaict there's no actual need to hide the AUX buffer for this sampling
> > stuff; the user knows about all this and can simply mmap() the AUX part.
> > The sample could either point to locations in the AUX buffer, or (as I
> > think this code does) memcpy bits out.
>
> This would work for perf, but not for the core dump case below.
>
> > Ideally we'd pass the AUX-event into the syscall, that way you avoid all
> > the find_aux_event crud. I'm not sure we want to overload the group_fd
> > thing more (its already very hard to create counter groups in a cgroup
> > for example) ..
> >
> > Coredump was mentioned somewhere, but I'm not sure I've seen
> > code/interfaces for that. How was that envisioned to work?
>
> The idea was to have a rlimit that enables PT running as a ring buffer
> in the background. If something crashes the ring buffer is dumped
> as part of the core dump, and then gdb can tell you how you crashed.
> This extends what gdb already does explicitly today using perf
> API calls.

Well, we could 'force' inject a VMA into the process's address space, we
do that for a few other things as well. It also makes for less
exceptions with the actual core dumping.

But the worry I have is the total amount of pinned memory. If you want
to inherit this on fork(), as is a reasonable expectation, then its
possible to quickly exceed the total amount of pinnable memory.

At which point we _should_ start failing fork(), which is a somewhat
unexpected, and undesirable side-effect.

Ideally we'd unpin the old buffers and repin the new buffers on context
switch, but that's impossible since faulting needs scheduling,
recursion, we loose.

I really want to see something sensible before we go do that.