Re: [PATCH 1/2] perf: Add munmap callback

From: Stephane Eranian
Date: Mon Nov 05 2018 - 05:59:24 EST


Hi Kan,

I built a small test case for you to demonstrate the issue for code and data.
Compile the test program and then do:
For text:
$ perf record ./mmap
$ perf report -D | fgrep MMAP2

The test program mmaps 2 pages, unmaps the second, and remap 1 page
over the freed space.
If you look at the MMAP2 record, you will not be able to reconstruct
what happened and perf will
get confused should it try to symbolize from the address range

With Text:
PERF_RECORD_MMAP2 5937/5937: [0x400000(0x1000) @ 0 08:01 400938
824817672]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon

^^^^^^^^^^^^^^^^^^^^^^^^ captures the whole VMA but not the mapping
change in user space

For data:
$ perf record -d ./mmap
$ perf report -D | fgrep MMAP2
With data:
PERF_RECORD_MMAP2 6430/6430: [0x400000(0x1000) @ 0 08:01 400938
3278843184]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon

Same test case with data.
Perf will think the entire 2 pages have been replaced when in fact
only the second has.
I believe the problem is likely to impact data and jitted code cache

#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <err.h>
#include <getopt.h>

int main(int argc, char **argv)
{
void *addr1, *addr2;
size_t pgsz = sysconf(_SC_PAGESIZE);
int n = 2;
int ret;
int c, mode = 0;

while ((c = getopt(argc, argv, "hd")) != -1) {
switch (c) {
case 'h':
printf("[-h]\tget this help\n");
printf("[-d]\tuse data mmaps (no PROT_EXEC)\n");
return 0;
case 'd':
mode = PROT_EXEC;
break;
default:
errx(1, "unknown option");
}
}
/* default to data */
if (mode == 0)
mode = PROT_WRITE;

/*
* mmap 2 contiugous pages
*/
addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (addr1 == (void *)MAP_FAILED)
err(1, "mmap 1 failed");

printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz);

/*
* unmap only the second page
*/
ret = munmap(addr1 + pgsz, pgsz);
if (ret == -1)
err(1, "munmp failed");

/*
* mmap 1 page at the location of the unmap page (should reuse virtual space)
* This creates a continuous region built from two mmaps and
potentially two different sources
* especially with jitted runtimes
*/
addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

printf("addr2=%p\n", addr2);

if (addr2 == (void *)MAP_FAILED)
err(1, "mmap 2 failed");
if (addr2 != (addr1 + pgsz))
errx(1, "wrong mmap2 address");

sleep(1);

return 0;
}

On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>
>
>
> On 10/24/2018 3:30 PM, Stephane Eranian wrote:
> > The need for this new record type extends beyond physical address conversions
> > and PEBS. A long while ago, someone reported issues with symbolization related
> > to perf lacking munmap tracking. It had to do with vma merging. I think the
> > sequence of mmaps was as follows in the problematic case:
> > 1. addr1 = mmap(8192);
> > 2. munmap(addr1 + 4096, 4096)
> > 3. addr2 = mmap(addr1+4096, 4096)
> >
> > If successful, that yields addr2 = addr1 + 4096 (could also get the
> > same without forcing the address).
> >
> > In that case, if I recall correctly, the vma for 1st mapping (now at
> > 4k) and that of the 2nd mapping (4k)
> > get merged into a single 8k vma and this is what perf_events will
> > record for PERF_RECORD_MMAP.
> > On the perf tool side, it is assumed that if two timestamped mappings
> > overlap then, the latter overrides
> > the former. In this case, perf would loose the mapping of the first
> > 4kb and assume all symbols comes from
> > 2nd mapping. Hopefully I got the scenario right. If so, then you'd
> > need PERF_RECORD_UNMAP to
> > disambiguate assuming the perf tool is modified accordingly.
> >
>
> Hi Stephane and Peter,
>
> I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying
> to understand the problematic case.
>
> It looks like the issue can only be triggered by perf inject --jit.
> Because it can inject extra MMAP events.
> As my understanding, Linux kernel only try to merge VMAs if they are
> both from anon or they are both from the same file. --jit breaks the
> rule, and makes the merged VMA partly from anon, partly from file.
> Now, there is a new MMAP event which range covers the modified VMA.
> Without the help of MUNMAP event, perf tool have no idea if the new one
> is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA.
> Current code just simply overwrite the modified VMAs. The VMA
> information which --jit injected may be lost. The symbolization may be
> lost as well.
>
> Except --jit, the VMAs information should be consistent between kernel
> and perf tools. We shouldn't observe the problem. MUNMAP event is not
> needed.
>
> Is my understanding correct?
>
> Do you have a test case for the problem?
>
> Thanks,
> Kan