Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7

From: Ingo Molnar
Date: Thu Sep 02 2004 - 08:40:15 EST



* Mark_H_Johnson@xxxxxxxxxxxx <Mark_H_Johnson@xxxxxxxxxxxx> wrote:

> >In any case, please enable nmi_watchdog=1 so that we can see (in -Q7)
> >what happens on the other CPUs during such long delays.
>
> Booted with nmi_watchdog=1, saw the kernel message indicating that
> NMI was checked OK.
>
> The first trace looks something like this...
>
> latency 518 us, entries: 79
> ...
> started at schedule+0x51/0x740
> ended at schedule+0x337/0x740
>
> 00000001 0.000ms (+0.000ms): schedule (io_schedule)
> 00000001 0.000ms (+0.000ms): sched_clock (schedule)
> 00010001 0.478ms (+0.478ms): do_nmi (sched_clock)
> 00010001 0.478ms (+0.000ms): do_nmi (<08049b21>)
> 00010001 0.482ms (+0.003ms): profile_tick (nmi_watchdog_tick)
> ...
> and a few entries later ends up at do_IRQ (sched_clock).
>
> The second trace goes from dequeue_task to __switch_to with a
> similar pattern - the line with do_nmi has +0.282ms duration and
> the line notifier_call_chain (profile_hook) as +0.135ms duration.
>
> I don't see how this provides any additional information but will
> provide several additional traces when the test gets done in a few
> minutes.

thanks. The NMI gives us two kinds of information, both useful:

- if the ratio of do_nmi()'s within such a section roughly matches the
number of NMIs we'd expect during the sum of these sections then it
means that the delay is most likely wall-clock time and not some
measurement artifact (RDTSC artifact or tracing bug). The NMI's are
triggered (indirectly) by the PIT and the PIT is an independent clock
that has a frequency that is independent of the rest of the system
(independent of the CPU clock, DMA activities, IRQ load, etc.)

since most of the codepaths in question (the scheduler's
dequeue_task(), etc.) run with interrupts disabled the normal timer
interrupts (smp_apic_timer_interrupt() and do_IRQ(00000000)) cannot
'sample' this codepath. Only the NMI can interrupt these codepaths.

- the NMIs also sample what happens on the other CPU. In your above
trace this gives:

> 00010001 0.478ms (+0.478ms): do_nmi (sched_clock)
> 00010001 0.478ms (+0.000ms): do_nmi (<08049b21>)

the other CPU was executing userspace code during the last NMI tick -
i.e. nothing that could be suspect. 'suspect' code would be some sort
kernel code that could in theory interact with this CPU's scheduler
code.

this too is statistical sampling so we'll need as much of these
traces as possible.

some wacky guess based on the above single sampling point: it seems the
delays are real wall-clock delays, and the only thing matching the
theory so far is that DMA traffic on the memory bus somehow stalls this
CPU's memory traffic for up to 500 usecs. How could userspace running on
CPU#0 impact the kernel's scheduler code on CPU#1?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/