Re: [PATCH 5/5] uprobes: Change uprobe_copy_process() to dup xol_area

From: Peter Zijlstra
Date: Mon Oct 14 2013 - 10:10:04 EST


On Sun, Oct 13, 2013 at 09:18:44PM +0200, Oleg Nesterov wrote:
> This finally fixes the serious bug in uretprobes: a forked child
> crashes if the parent called fork() with the pending ret probe.
>
> Trivial test-case:
>
> # perf probe -x /lib/libc.so.6 __fork%return
> # perf record -e probe_libc:__fork perl -le 'fork || print "OK"'
>
> (the child doesn't print "OK", it is killed by SIGSEGV)
>
> If the child returns from the probed function it actually returns
> to trampoline_vaddr, because it got the copy of parent's stack
> mangled by prepare_uretprobe() when the parent entered this func.
>
> It crashes because a) this address is not mapped and b) until the
> previous change it doesn't have the proper->return_instances info.
>
> This means that uprobe_copy_process() has to create xol_area which
> has the trampoline slot, and its vaddr should be equal to parent's
> xol_area->vaddr.
>
> Unfortunately, uprobe_copy_process() can not simply do
> __create_xol_area(child, xol_area->vaddr). This could actually work
> but perf_event_mmap() doesn't expect the usage of foreign ->mm. So
> we offload this to task_work_run(), and pass the argument via not
> yet used utask->vaddr.
>
> We know that this vaddr is fine for install_special_mapping(), the
> necessary hole was recently "created" by dup_mmap() which skips the
> parent's VM_DONTCOPY area, and nobody else could use the new mm.
>
> Unfortunately, this also means that we can not handle the errors
> properly, we obviously can not abort the already completed fork().
> So we simply print the warning if GFP_KERNEL allocation (the only
> possible reason) fails.

Oh cute.. so we could actually ignore this perf_event_mmap() because we
got it for the parent when we inserted the probe, and the perf tools
assume the child mm layout is identical to the parent layout (it doesn't
actually see the VM_DONTCOPY bit).

So we could add: 'if (vma->vm_mm != current->mm) return;' to
perf_event_mmap() with a very big nasty comment.

That said; should we hide the XOL vma from perf altogether? That is; it
will greatly obfuscate the perf data to get hits from the XOL table as
we've got no means of mapping it back to an instruction.

We could transform the perf IP from XOL areas back to the original
instruction site. The only side effect that has is that since the XOL
code is far more expensive than the original single instruction the
instruction appears excessively more expensive than expected.

Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/