Re: [PATCH 4.4 014/134] perf tools: Make perf_event__synthesize_mmap_events() scale

From: Ben Hutchings
Date: Thu Mar 29 2018 - 12:14:04 EST


On Mon, 2018-03-19 at 19:04 +0100, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.ÂÂIf anyone has any objections, please let me know.
>
> ------------------
>
> From: Stephane Eranian <eranian@xxxxxxxxxx>
>
>
> [ Upstream commit 88b897a30c525c2eee6e7f16e1e8d0f18830845e ]
>
> This patch significantly improves the execution time of
> perf_event__synthesize_mmap_events() when running perf record on systems
> where processes have lots of threads.
>
> It just happens that cat /proc/pid/maps support uses a O(N^2) algorithm to
> generate each map line in the maps file.ÂÂIf you have 1000 threads, then you
> have necessarily 1000 stacks.ÂÂFor each vma, you need to check if it
> corresponds to a thread's stack.ÂÂWith a large number of threads, this can take
> a very long time. I have seen latencies >> 10mn.
>
> As of today, perf does not use the fact that a mapping is a stack, therefore we
> can work around the issue by using /proc/pid/tasks/pid/maps.ÂÂThis entry does
> not try to map a vma to stack and is thus much faster with no loss of
> functonality.
>
> The proc-map-timeout logic is kept in case users still want some upper limit.
>
> In V2, we fix the file path from /proc/pid/tasks/pid/maps to actual
> /proc/pid/task/pid/maps, tasks -> task.ÂÂThanks Arnaldo for catching this.
>
> Committer note:
>
> This problem seems to have been elliminated in the kernel since commit :
> b18cb64ead40 ("fs/proc: Stop trying to report thread stacks").
[...]

I don't think so. It looks like this was fixed by commit 65376df58217
("proc: revert /proc/<pid>/maps [stack:TID] annotation") which we
already have in 4.4-stable. But older branches (3.16, 3.18, 4.1) don't
have that and probably should do.

It looks like commit b18cb64ead40 ("fs/proc: Stop trying to report
thread stacks") is also a candidate for stable.

Ben.

--
Ben Hutchings
Software Developer, Codethink Ltd.