Re: [RFC] convert ftrace syscall tracer to TRACE_EVENT()

From: Frédéric Weisbecker
Date: Sat May 09 2009 - 11:02:00 EST


2009/5/9 Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx>:
> * Ingo Molnar (mingo@xxxxxxx) wrote:
>>
>> * Frédéric Weisbecker <fweisbec@xxxxxxxxx> wrote:
>>
>> > > I would expect to use copy_string_from_user (for strings) and
>> > > copy_from_user for structures, because without any strings
>> > > (especially), the trace information become much less useful.
>> >
>> > Yeah, for structures we would just need the copy_from_user.
>>
>> There's just a few places (mainly related to VFS APIs) where we
>> really want to do that, and there we want to do it a bit later, not
>> at syscall time: we want to do it after the getname(), to output a
>> stable (and already copied to kernel space) copy of the file name.
>>
>> So the right solution there would be to add special, case by case
>> tracepoints to those few places. We dont need strings for the
>> majority of the 300+ system calls that exist on Linux.
>>
>>       Ingo
>
> Hrm, this is an important design decision.. I cover a lot of those sites
> in my LTTng instrumentation, and this is clearly one way to do it, at
> the expense of adding tracepoints in many kernel locations when there
> could be a functionnal equivalent with syscall instrumentation.


Yeah, these tracepoints defined from DEFINE_SYSCALL are a good way
to proceed generically.
For specific cases, we can later add some upper layer, such as described below.


> The thing we would need to do it from the syscall tracing site is a
> table to map the system call numbers to their specific types (for the
> syscalls we care about) and therefore which would also map to a
> serialisation function to extract the parameters and write the correct
> content into the trace buffers.


I would rather see this not using the syscalls as a key but the type
of a parameter.
We can find a same specific complex type used by several syscalls.

If we want even better precision, we can also pair that with syscalls
mapping for specific post-computing in output time. As an exemple to
print O_RDONLY instead of the matching number.


>
> We could also use getname()/putname() in the syscall tracing primitive.
> Note that architectures like x86 64 needs some tweaks I have in my
> patchset to correctly ensure that syscall entry/exit are always paired.
> This is required because we change the thread flag synchronously with
> thread execution upen activation/deactivation.


Not sure I understand your point here. The only resulting problem of such
race would be rare unpaired syscall exit or entry traces... Is it that
much important?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/