Re: BUG: ftrace/perf dropping events at the begin of interrupt handlers

From: Daniel Bristot de Oliveira
Date: Fri Dec 14 2018 - 05:21:42 EST


On 12/4/18 8:16 PM, Steven Rostedt wrote:
> Yes, it's a simple fix. The problem is that the recursion detection of
> the function tracer requires that when its called from interrupt, the
> "in_interrupt" needs to be true, otherwise it thinks that the function
> tracer is recursing on itself (which is common).
>
> Looking an the dropped events, and the code in __irq_enter() we have
> this:
>
> #define __irq_enter() \
> do { \
> account_irq_enter_time(current); \
> preempt_count_add(HARDIRQ_OFFSET); \ <<-- in_interrupt() returns true here
> trace_hardirq_enter(); \
> } while (0)
>
> Interesting enough, the dropped events happen to be in
> account_irq_enter_time()!
>
> Thus what I believe is happening is that an interrupt came in while one
> event was being recorded. When account_irq_enter_time was called, the
> function tracer noticed that its recursion bit for the current context
> was already set, and just dropped the event because it thought it was
> just tracing itself. After we add HARDIRQ_OFFSET to preempt_count, the
> "in_interrupt()" will be set and the function tracer will know its in a
> new context where its safe to continue tracing.
>
> Can you try this patch to see if it fixes it for you?

Hi Steve,

I finally took some time to play the patch, sorry for the delay. I got the idea
of the patch, but it is not working as expected :-(.

When I enable it, the system [a VM with 1 CPU] mostly freezes when I run that:

# while [ 1 ]; do echo > /dev/null; done &

I still need to investigate why.

The other point is that I got that the patch would start showing
account_irq_enter_time(). But, as far as I understood, it would not trace the
do_IRQ(). Right?

Wouldn't be the case of using a per-cpu variable to set the flag right in the
begin of the handler (in the entry*.s)?

Thoughts?

-- Daniel