[EXAMPLE CODE] Parasite thread injection using PTRACE_SEIZE andfriends

From: Tejun Heo
Date: Wed Jul 20 2011 - 10:00:46 EST


Hello,

This has taken much longer than expected (which BTW is usually
expected) but ptrace fixes and new features are mostly complete now.
They're sitting in Oleg's ptrace branch waiting for merge window.

http://git.kernel.org/?p=linux/kernel/git/oleg/misc.git;a=shortlog;h=refs/heads/ptrace
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git ptrace

With new ptrace requests, a process can be captured and manipulated
practically transparently. Other than syscall retry or -EINTR failure
in special cases and timing difference, everything including job
control stop state stays transparent across ptrace operations.

One of the concerns raised about using ptrace for CR was that it
doesn't have access to states which are visible only to the process
being checkpointed and exporting every such information outside would
be too laborious. The attached ptrace-parasite example code
demonstrates how this can be solved. Using new ptrace requests, it
inserts a parasite thread to the host process transparently. The code
is also available in the following git branch (the first link is code
brwser, second git branch you can clone from).

http://code.google.com/p/ptrace-parasite/source/browse/
https://code.google.com/p/ptrace-parasite/ ptrace-parasite

It only works on x86-64 and requires Oleg's ptrace branch. 'make'
produces two binaries - simple-host and parasite. If you run
simple-host in a terminal and run parasite with the pid (thread 00's
tid) of the simple-host in another terminal, you should see something
like the following.

# ./simple-host
thread 01(4580): alive
thread 02(4581): alive
thread 03(4582): alive
thread 04(4583): alive
thread 00(4579): alive
hello, world!
parasite: hello, world!
parasite: tid / time = 4629 / 1311169280
thread 03(4582): alive
thread 02(4581): alive
thread 01(4580): alive
thread 04(4583): alive
thread 00(4579): alive
...

# ./parasite 4579
Seizing 4579
Seizing 4580
Seizing 4581
Seizing 4582
Seizing 4583
executing test blob
blocking all signals = 0, prev_sigmask 0
executing mmap blob = 0x7fc4eca35000
executing clone blob = 4629
executing parasite
executing munmap blob = 0
restoring sigmask = 0, prev_sigmask 0xfffffffffffbfeef

The first "hello, world!" is printed by the infected host thread which
is then directed to block all signals, mmap an area and clone parasite
thread. The lines which start with "parasite: " are printed by the
new parasite thread. While the parasite is running, all host threads
are ptrace trapped and when they're resumed they have no way to find
out what happened to their precious process. Note that host can be
any program.

The implementation is naive and simplicistic, especially the part
which seizes all threads belonging to the target process but it should
be enough to demonstrate how this can be done.

I'm sure there still are a lot of things missing for reasonable
userland CR but I think this should at least provide the core process
capturing part of it and make the whole thing more feasible.

One missing piece is that it can't operate on a process which is
already being ptraced. Adding nested ptrace would solve some part of
it but it leads to a lot of complexity, most of it stemming from the
fact that it diversifies the places target processes may be trapped
at. Determining which exact point isn't that difficult but rolling
back from and restoring to some of those debug traps can be difficult
or even impossible. For debugger checkpointing, probably more
cooperative approach would make more sense. Anyways, I don't think
this is too big a deal at this point.

Thank you.

--
tejun

Attachment: ptrace-parasite.tar.gz
Description: GNU Zip compressed data