I see it exactly the opposite. Only a very small minority of cases willI havent seen a conscise summary of your points in this thread, so let me
have such severe memory corruption that tracing will fall apart because of
random writes to memory; especially on 64-bit where the address space is
sparse. On the other hand, knowing that the cost is a few dozen cycles
rather than a thousand or so means that you can trace production servers
running full loads without worrying about whether tracing will affect
whatever it is you're trying to observe.
I'm not against slow reliable tracing, but we shouldn't ignore the need for
speed.
summarize it as i've understood them (hopefully not putting words into your
mouth): AFAICS you are arguing for some crazy fragile architecture-specific
solution that traps INT3 into ring3 just to shave off a few cycles, and then
use user-space state to trace into.
If so then you ignore the obvious solution to _that_ problem: dont use INT3 at
all, but rebuild (or re-JIT) your program with explicit callbacks. It's _MUCH_
faster than _any_ breakpoint based solution - literally just the cost of a
function call (or not even that - i've written very fast inlined tracers -
they do rock when it comes to performance). Problem solved and none of the
INT3 details matters at all.
INT3 only matters to _transparent_ probing, and for that, the cost of INT3 is
almost _by definition_ less important than the fact that we can do transparent
tracing. If performance were the overriding issue they'd use dedicated
callbacks - and the INT3 technique wouldnt matter at all.
( Also, just like we were able to extend the kprobes code with more and more
optimizations, the same can be done with any user-space probing as well, to
make it faster. But at the core of it has to be a sane design that is
transparent and controlled by the kernel, so that it has the option to apply
more and more otimizations - yours isnt such and its limitations are
designed-in.
Which is neither smart nor useful. )