Re: [PATCH] tracing: Trace instrumentation begin and end

From: Thomas Gleixner
Date: Wed Mar 22 2023 - 11:39:59 EST


Steven!

On Wed, Mar 22 2023 at 08:48, Steven Rostedt wrote:
> On Wed, 22 Mar 2023 12:19:14 +0100
> Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> Seriously? That's completely insane. Have you actually looked how many
>> instrumentation_begin()/end() pairs are in the affected code pathes?
>>
>> Obviously not. It's a total of _five_ for every syscall and at least
>> _four_ for every interrupt/exception from user mode.
>>
>> The number #1 design rule for instrumentation is to be as non-intrusive as
>> possible and not to be as lazy as possible.
>
> And it still is. It still uses nops when not enabled. I could even add a
> config to only have this compiled in when the config is set, so that
> production can disable it if it wants to.
>
> Just in case it's not obvious:
>
> if (tracepoint_enabled(instrumentation_begin))
> call_trace_instrumentation_begin(ip, pip);
>
> is equivalent to:
>
> if (static_key_false(&__tracepoint_instrumentation_begin.key))
> call_trace_instrumentation_begin(ip, pip);

And that makes the insanity of enabling 10 tracepoints in the syscall
path and at mininum 8 tracepoints in the exception/interrupt path any
smaller?

> We have trace points in preempt_enable/disable() I think that's *far* more
> intrusive.

What? If you want to do preempt_enable()/disable() tracing then there is
no other choice than tracing every single invocation.

But for figuring out how long a syscall, interrupt or exception takes
there are exactly TWO tracepoints required, not 10 or 8. And it's bloody
obvious where to place them, right?

>> instrumentation_begin()/end() is solely meant for objtool validation and
>> nothing else.
>>
>> There are clearly less horrible ways to retrieve the #PF duration, no?
>
> It's not just for #PF, that was just one example. I use to use function
> graph tracing max_depth_count=1 to verify NO_HZ_FULL to make sure there's
> no entry into the kernel. That doesn't work anymore. Even compat syscalls
> are not traced.

That still works. noinstr did neither break syscall tracing nor any of
the interrupt/exception tracepoints which can be used to validate the
NOHZ full mechanics. Your fancy favourite script might not work anymore,
but claiming that it can't be traced is just nonsense.

> I lost a kernel feature with the noinstr push and this is the closest that
> comes to bringing it back.

This is the closest _you_ came up with without thinking about it for a
split second.

> And the more we add noinstr, the more the kernel becomes a black box
> again.

It does not. noinstr is a technical requirement to keep instrumentation
out of code pathes which are not able to handle instrumentation. You
know that very well, so please stop this theatrical handwaving and come
back if you have sensible technical arguments.

Thanks,

tglx