Re: [PATCH 0/5] [GIT PULL] updates for tip/tracing/ftrace

From: Ingo Molnar
Date: Sat Mar 21 2009 - 14:19:30 EST



* Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:

> > [<ffffffff8020c79d>] return_to_handler+0x0/0x73
> > [<ffffffff8029ea13>] rcu_pending+0x2c/0x5e
> > [<ffffffff8020c79d>] return_to_handler+0x0/0x73
> > [<ffffffff8026abef>] update_process_times+0x3c/0x77
> > [<ffffffff8020c79d>] return_to_handler+0x0/0x73
> > [<ffffffff802875dd>] tick_periodic+0x6e/0x70
>
>
> Still hanging in the timer interrupt.
> I guess it makes the timer interrupt servicing too slow and then
> once it is serviced, another one is raised.
>
> But the cause is perhaps more complex
>
> I think you have had too much hanging of this type. I'm preparing
> a fix that checks periodically if the function graph tracer is
> spending too much time in an interrupt.
>
> I guess I could count the number of function executed between the
> irq entry and its exit.
>
> That's the best: if we are hanging in an interrupt, it could be
> whatever interrupt and the jiffies could not be progressing so I
> can't rely on time but only on number of functions executed.
>
> May be 10000 calls is a good threshold before killing the function
> graph inside an interrupt?

i think the problem isnt even the IRQ handler - but the fact that
the (timer) irq handler gets re-triggered - so all we do is
processing timer IRQs.

Your patch would detect a timer IRQ hanging - but it would not
detect the 'system makes no progress because there's always anoter
pending timer IRQ to execute' situation.

So i think we need a "function trace watchdog" - which kills the
tracer if we do more than 100,000,000 entries since we started the
self-test, or so.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/