[discuss] BTS overflow handling, was: [PATCH] perf_counter: Fix arace on perf_counter_ctx

From: Metzger, Markus T
Date: Tue Sep 01 2009 - 07:18:32 EST

Ingo, Peter,

>>>> Currently, I'm not sure that this (i.e. that the interrupt
>>>> handling takes too long) is the underlying problem of the hangs
>>>> that I'm seeing.
>>>I havent seen a plausible theory yet about why an actual lockup
>>>would happen on your box.
>>So you do not think that taking too long in the ISR could cause this?
>>And is it working on your box?

My current theory is that the BTS buffer fills up so quickly when tracing
the kernel, that the kernel is busy handling overflows and reacting on
other interrupts that pile up while we're handling the BTS overflow.

When I trace user-mode branches, it works.

When I do not copy the trace during overflow handling, the kernel does not hang.

When I attach a jtag debugger to a hung system (perf top and perf record
-e branches -c 1), I find that one core is waiting for an smp call
response, while the other core is busy emptying the BTS buffer.

When I then disable branch tracing (the debugger prevents the kernel
from changing DEBUGCTL to enable tracing again), the system recovers.

I have a patch that switches buffers during overflow handling and leaves
the draining for later (which currently never happens) - the kernel does
not hang, in that case.

I do need 3 buffers of 2048 entries = 3x48 pages per cpu, though. One buffer
to switch in during overflow handling; another to switch in during sched_out
(assuming that we need to schedule out the traced task before we may start
the draining task). Even then, there's a chance that we will lose trace
when the draining task may not start immediately. I would even say that
this is quite likely.

What I do not have, yet, is the actual draining. Draining needs to start
after the counter has been disabled. But draining needs the perf_counter
to drain the trace into. The counter will thus be busy after it has been
disabled - ugly.

There already seems to be something in place regarding deferring work, i.e.
perf_counter_do_pending(). Would it be OK if I added the deferred BTS buffer
draining to that?

Looks like this would guarantee that the counter does not go away as long
as there is work pending. Is this correct?

In any case, this is getting late for the upcoming merge window.
Would you rather drop the BTS patch or disable kernel tracing?

thanks and regards,

Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/