Re: [PATCH 08/12] add trace events for each syscall entry/exit

From: Ingo Molnar
Date: Wed Aug 26 2009 - 03:28:41 EST

Next message: raz ben yehuda: "Re: RFC: THE OFFLINE SCHEDULER"
Previous message: Peter Zijlstra: "Re: [PATCH] tracing/profile: Fix profile_disable vs module_unload"
In reply to: Mathieu Desnoyers: "Re: [PATCH 08/12] add trace events for each syscall entry/exit"
Next in thread: Mathieu Desnoyers: "Re: [PATCH 08/12] add trace events for each syscall entry/exit"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx> wrote:

> * Frederic Weisbecker (fweisbec@xxxxxxxxx) wrote:
> > On Tue, Aug 25, 2009 at 03:51:11PM -0400, Mathieu Desnoyers wrote:
> > > * Frederic Weisbecker (fweisbec@xxxxxxxxx) wrote:
> > > > On Tue, Aug 25, 2009 at 02:31:19PM -0400, Mathieu Desnoyers wrote:
> > > > > (Well, I do not have time currently to look into the gory details
> > > > > (sorry), but let's try to take a step back from the problem.)
> > > > >
> > > > > The design proposal for this kthread behavior wrt syscalls is based on a
> > > > > very specific and current kernel behavior, that may happen to change and
> > > > > that I have actually seen proven incorrect. For instance, some
> > > > > proprietary Linux driver does very odd things with system calls within
> > > > > kernel threads, like invoking them with int 0x80.
> > > > >
> > > > > Yes, this is odd, but do we really want to tie the tracer that much to
> > > > > the actual OS implementation specificities ?
> > > >
> > > >
> > > > I really can't see the point in doing this. I don't expect the kernel
> > > > behaviour to change soon and have explicit syscalls interrupts done
> > > > from it. It's not about a current kernel implementation fashion,
> > > > it's about kernel design sanity that is not likely to go backward.
> > > >
> > > > Is it worth it to trace kernel threads, maintain their tracing
> > > > specificities (such as workarounds with ret_from_fork that implies)
> > > > just because we want to support tracing on some silly proprietary drivers?
> > > >
> > > >
> > > > >
> > > > > That sounds like a recipe for endless breakages and missing bits of
> > > > > instrumentation.
> > > > >
> > > > > So my advice would be: if we want to trace the syscall entry/exit paths,
> > > > > let's trace them for the _whole_ system, and find ways to make it work
> > > > > for corner-cases rather than finding clever ways to diminish
> > > > > instrumentation coverage.
> > > >
> > > >
> > > > If developers of out of tree drivers want to implement buggy things
> > > > that would never be accepted after a minimal review here, and then instrument
> > > > their bugs, then I would suggest them to implement their own ad hoc instrumentation,
> > > > really :-/
> > > >
> > > > What's the point in supporting out of tree bugs?
> > > >
> > > > Well, the only advantage of doing this would be to support reverse engineering
> > > > in tiny and rare corner cases. Not that worth the effort.
> > > >
> > > >
> > > > > Given the ret from fork example happens to be the first event fired
> > > > > after the thread is created, we should be able to deal with this problem
> > > > > by initializing the thread structure used by syscall exit tracing to an
> > > > > initial "ret from fork" value.
> > > > >
> > > > > Mathieu
> > > >
> > > >
> > > > It means we have to support and check this corner case in every archs
> > > > that support syscall tracing, deal with crashes because we omitted it, etc...
> > > >
> > > > For all the things I've explained above I don't think it's worth the effort.
> > > >
> > > > But it's just my opinion...
> > > >
> > >
> > > Then we might want to explicitly require that calls to sys_*() system
> > > calls made from within the kernel pass through another instrumentation
> > > mechanism. IMHO, that would make sense. It would cover both system calls
> > > made from kernel threads and system calls made from within a system call
> > > or trap.
> > >
> > > Mathieu
> >
> >
> > Well, we can't really set a tracepoint per sys_*() function. Or more
> > precisely we already have them, automagically generated and relying on
> > sysenter ptrace path.
> >
> > But if we want to check which syscalls are called from kernel threads, we have:
> >
> > - kthread() -> do_exit()
> >
> >
> > The entry point of every kernel threads (except "kthreadd") is
> > kthread(). It calls do_exit() in the end.
> >
> > If we want to trace the exit of a kernel thread, we can put
> > a tracepoint there instead of do_exit() which results would
> > be intermixed with sys_exit() tracing.
> >
> >
> > - kthreadd :: create_kthread() -> kernel_thread() -> do_fork()
> >
> >
> > A creation of a thread is the result of the kthreadd thread fork().
> > If we want to trace the creation of kernel threads, we can again do that
> > in the upper level: kernel_thread().
> >
> > But does that inform us about who created the thread? All we would see
> > is kthreadd that forks. This is a very poor information compared
> > to a userspace fork() that tells us who really created the new process.
> >
> > Instead what we want is probably to trace kthread_create() which inserts the
> > job of a thread creation in the kthreadd thread, so that we know
> > _who_ asked for this thread creation (process that requested it and callsite).
> > And that's much more rich in information.
> >
> > Well, you can even climb in an upper layer and look if this is a workqueue,
> > a kernel/async.c thread, a slow work, etc...
> >
> >
> > - kernel_execve() -> sys_execve()
> >
> > We can execute user apps from kernel through call_usermodehelper().
> > And we can trace kernel_execve() or again in an upper layer
> > like call_usermodehelper()
> >
> > - ... I guess there are other examples
> >
> > The kernel calls syscalls through wrappers, and tracing these
> > wrappers, depending of the desired level of informations we want
> > (choose your layer), are much more verbose / rich in
> > informations.
>
> What you describe looks a lot like the approach I use in the LTTng
> tree. Actually, the main point I am trying to make here is: if we
> rely only on tracing at the syscall entry/exit level for, say,
> monitoring all uses of e.g. sys_open(), we might be caught
> offguard by internal sys_open() uses within the kernel.

There's a lot of 'internal' file opening going on within the kernel
that ptrace does not notice - see all the filp_open() calls.

Lets worry about this only if it's a true issue.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: raz ben yehuda: "Re: RFC: THE OFFLINE SCHEDULER"
Previous message: Peter Zijlstra: "Re: [PATCH] tracing/profile: Fix profile_disable vs module_unload"
In reply to: Mathieu Desnoyers: "Re: [PATCH 08/12] add trace events for each syscall entry/exit"
Next in thread: Mathieu Desnoyers: "Re: [PATCH 08/12] add trace events for each syscall entry/exit"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]