Re: Adding checkpointing API to Linux kernel

Werner Krebs (werner.krebs@yale.edu)
Mon, 18 Jan 1999 18:11:04 -0500


Michael Elizabeth Chastain wrote:
>
> Hi Werner,
>
> > Andy Glew writes about the need for an NT-like (gask!) API in the
> > GNU/Linux that would all trapping of OS system calls without
> > recompiling code.
>
> First I am going to chide you for mindlessly writing "GNU/Linux"
> when you are talking about the Linux Kernel, not a whole system.
> Consider yourself chided.

Except that I've also gotten email for saying "Linux" kernel when ~it
was technically correct but nevertheless would not have hurt" to say
'GNU/Linux.'~ Believe me, folks, I appreciate the difference and I
appreciate the politics, so let's avoid this topic as it is a sore spot.

> The API exists and it's named "ptrace". Works great, too.

Yes, but we're talking about checkpoint migration of CPU and I/O
intensive jobs to boxes that have free CPU resources. ptrace() will slow
things down significantly. At least, that was my experience on the Cray
Y-MP running Cray UNICOS, where the delay from ptrace() was VERY
noticeable. With GNU/Linux, err, Linux kernel, err GNU/Linux, it is
significantly less noticeable but still there, I'll bet.

> I have a trace-and-replay program based on ptrace. The tracer is similar
> to strace.
>
> The replayer is the cool part. It takes control whenever the target
> process executes a system call, annuls the original system call, and
> overwrites the target process registers and address space with the values
> that I want to be in there.
>
> I also run gdb (or any other debugger) as a client program and filter
> all its calls to ptrace. Effectively, the replayer is a proxy server
> for ptrace. It shows gdb a picture of the target process replaying
> its execution.
>
> All this runs in user space with stock linux kernel, stock target
> binaries, and stock gdb.
>
> It's been running like this for three years. I released the source code
> under GPL in November 1995. As far as I know, three people in the entire
> world have ever run it, counting me.
>
> One of the two guys put up a mud server and traced it. He sent me
> the trace file, and I ran gdb on it. I re-executed his program, I
> set breakpoints anywhere I wanted, I inspected data at any breakpoint.
> Hmmm, there's a structure that looks funny, I'll just restart and set an
> earlier breakpoint.
>
> During those three years of no interest, the linux kernel interface has
> shifted again and again. The replayer needs a table of every system call
> and how it affects memory, and that table needs more entries every week
> (thanks to ioctl). So I have a great demo, if you have 1.3.42 kernel
> headers to compile it against.
>
> ftp://ftp.shout.net/pub/users/mec/misc/mec-0.3.tar.gz
>
> There's more.
>
> If I put memory-access rule checking in at replay time, I can do better
> than e-fence, on stock binaries with no recompilation. Hell, I can do
> better than *Purify* on *stock binaries* and without tangling with their
> object-code-insertion patents.
>
> I have enough information available in the proxy ptrace filter to
> implement PTRACE_SINGLESTEP_BACKWARDS. How would you like to have that
> capability in gdb? "Execute backwards until this data watchpoint
> changes." Imagine a graphical debugger with a scrollbar for time,
> where the top is "beginning of execution" and the bottom is "end of
> execution."
>
> And remember, you are doing all this on a trace file that the user of
> your program sent in from the field without changing a *damn thing*
> on their system, except for running the trace wrapper program. They
> don't even need symbols on their executable, as long as you have an
> identical executable that does have symbols.
>
> Your customer's Apache tips over every two weeks under heavy load?
> Tell them to run it under the tracer and send you a trace file the next
> time it tips over.
>
> You need to debug your real-time embedded program? Trace it, run it in
> real time, then take the trace file back to your high-powered workstation.
>
> This is radical paradigm-shifting technology. It's the best program I
> ever wrote. It's probably the best program I ever *will* write in my
> entire life.
>
> The entire reason I got involved in linux development was to reach a
> point where I could talk about this technology and get more than two
> people to download the damn demo and try it out. To get to a place
> where the gdb maintainers at cygnus would respond to my letters.
>
> It hurts to talk about this. It brings tears to my eyes.

So, help us add checkpoint migration facilities to GNU Queue. You can
throw in your ptrace facility for free.

> I suppose it's off-topic, too, because it is a user space program.
> No kernel hooks needed.
>
> Time to get back to xconfig bugs.
>
> Michael Elizabeth Chastain
> <mailto:mec@shout.net>
> "love without fear"

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/