Re: [Bug #12465] KVM guests stalling on 2.6.28 (bisected)

From: Kevin Shanahan
Date: Mon Jan 26 2009 - 04:56:06 EST


On Wed, 2009-01-21 at 16:18 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi@xxxxxxxxxx> wrote:
> > It means, a scheduling problem. Can you run the latency tracer (which
> > only works with realtime priority), so we can tell if it is (a) kvm
> > failing to wake up the vcpu properly or (b) the scheduler delaying the
> > vcpu from running.
>
> Could we please get an ftrace capture of the incident?
>
> Firstly, it makes sense to simplify the tracing environment as much as
> possible: for example single-CPU traces are much easier to interpret.
>
> Can you reproduce it with just one CPU online? I.e. if you offline all the
> other cores via:
>
> echo 0 > /sys/devices/system/cpu/cpu1/online
>
> [etc.]
>
> and keep CPU#0 only, do the latencies still occur?
>
> If they do still occur, then please do the traces that way.
>
> [ If they do not occur then switch back on all CPUs - we'll sort out the
> traces ;-) ]
>
> Then please build a function tracer kernel, by enabling:
>
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_DYNAMIC_FTRACE=y
>
> Once you boot into such a kernel, you can switch on function tracing via:
>
> cd /debug/tracing/
>
> echo 0 > tracing_enabled
> echo function_graph > current_tracer
> echo funcgraph-proc > trace_options
>
> It does not run yet, first find a suitable set of functions to trace. For
> example this will be a pretty good starting point for scheduler+KVM
> problems:
>
> echo '' > set_ftrace_filter # clear filter functions
> echo '*sched*' >> set_ftrace_filter
> echo '*wake*' >> set_ftrace_filter
> echo '*kvm*' >> set_ftrace_filter
> echo 1 > tracing_enabled # let the tracer go
>
> You can see your current selection of functions to trace via 'cat
> set_ftrace_filter', and you can see all functions via 'cat
> available_filter_functions'.
>
> You can also trace all functions via:
>
> echo '*' > set_ftrace_filter
>
> Tracer output can be captured from the 'trace' file. It should look like
> this:
>
> 15) cc1-28106 | 0.263 us | page_evictable();
> 15) cc1-28106 | | lru_cache_add_lru() {
> 15) cc1-28106 | 0.252 us | __lru_cache_add();
> 15) cc1-28106 | 0.738 us | }
> 15) cc1-28106 | + 74.026 us | }
> 15) cc1-28106 | | up_read() {
> 15) cc1-28106 | 0.257 us | _spin_lock_irqsave();
> 15) cc1-28106 | 0.253 us | _spin_unlock_irqrestore();
> 15) cc1-28106 | 1.329 us | }
>
> To capture a continuous stream of all trace data you can do:
>
> cat trace_pipe > /tmp/trace.txt
>
> (this will also drain the trace ringbuffers.)
>
> Note that this can be quite expensive if there are a lot of functions that
> are traced - so it makes sense to trim down the set of traced functions to
> only the interesting ones. Which are the interesting ones can be
> determined from looking at the traces. You should see your KVM threads
> getting active every second as the ping happens.
>
> If you get lost events you can increase the trace buffer size via the
> buffer_size_kb control - the default is around 1.4 MB.
>
> Let me know if any of these steps is causing problems or if interpreting
> the traces is difficult.

Just carrying out the steps was okay, but I don't really know what I'm
looking at. I've uploaded the trace here (about 10 seconds worth, I
think):

http://disenchant.net/tmp/bug-12465/trace-1/

The guest being pinged is process 4353:

kmshanah@flexo:~$ pstree -p 4353
qemu-system-x86(4353)âââ{qemu-system-x86}(4354)
ââ{qemu-system-x86}(4355)
ââ{qemu-system-x86}(4772)

I guess the larger overhead/duration values are what we are looking for,
e.g.:

kmshanah@flexo:~$ bzgrep -E '[[:digit:]]{6,}' trace.txt.bz2
0) ksoftir-4 | ! 3010470 us | }
0) qemu-sy-4354 | ! 250406.2 us | }
0) qemu-sy-4354 | ! 250407.0 us | }
0) qemu-sy-4354 | ! 362946.3 us | }
0) qemu-sy-4354 | ! 362947.0 us | }
0) qemu-sy-4177 | ! 780480.3 us | }
0) qemu-sy-4354 | ! 117685.7 us | }
0) qemu-sy-4354 | ! 117686.5 us | }

That ksoftirqd value is a bit strange (> 3 seconds, or is the formatting
wrong?). I guess I still need some guidance to know what I'm looking at
with this trace and/or what to do next.

Cheers,
Kevin.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/