Re: [PATCH v5 18/20] perf: Allocate ring buffers for inherited per-task kernel events

From: Alexander Shishkin
Date: Fri Oct 24 2014 - 03:45:01 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

> On Mon, Oct 13, 2014 at 04:45:46PM +0300, Alexander Shishkin wrote:
>> Normally, per-task events can't be inherited parents' ring buffers to
>> avoid multiple events contending for the same buffer. And since buffer
>> allocation is typically done by the userspace consumer, there is no
>> practical interface to allocate new buffers for inherited counters.
>>
>> However, for kernel users we can allocate new buffers for inherited
>> events as soon as they are created (and also reap them on event
>> destruction). This pattern has a number of use cases, such as event
>> sample annotation and process core dump annotation.
>>
>> When a new event is inherited from a per-task kernel event that has a
>> ring buffer, allocate a new buffer for this event so that data from the
>> child task is collected and can later be retrieved for sample annotation
>> or core dump inclusion. This ring buffer is released when the event is
>> freed, for example, when the child task exits.
>>
>
> This causes a pinned memory explosion, not at all nice that.
>
> I think I see why and all, but it would be ever so good to not have to
> allocate so much memory.

Are there any controls we could use to limit such memory usage?
Theoretically, the buffers that we'd allocate for this are way smaller
than, for example, what we use if we try to capture a complete trace,
since we'd only be interested in the most recent trace data. We already
have RLIMIT_NPROC, which implicitly limits the number of these buffers,
for example. Or maybe we can introduce a new rlimit/sysctl/whatnot that
would limit the maximum amount of such memory per-cpu/system/user. What
do you think?

Per-cpu buffers with inheritance would solve this problem, but raises
other issues: we'd need sched_switch again to tell the traces apart and
since those buffers run in overwrite mode, a cpu hog task can
potentially overwrite any useful trace data.

Regards,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/