RE: [discuss] BTS overflow handling, was: [PATCH] perf_counter: Fixa race on perf_counter_ctx

From: Metzger, Markus T
Date: Thu Sep 03 2009 - 10:26:37 EST

>-----Original Message-----
>From: Peter Zijlstra [mailto:a.p.zijlstra@xxxxxxxxx]
>Sent: Tuesday, September 01, 2009 3:53 PM
>To: Metzger, Markus T
>Cc: Ingo Molnar; tglx@xxxxxxxxxxxxx; hpa@xxxxxxxxx; markus.t.metzger@xxxxxxxxx; linux-
>kernel@xxxxxxxxxxxxxxx; Paul Mackerras
>Subject: RE: [discuss] BTS overflow handling, was: [PATCH] perf_counter: Fix a race on
>On Tue, 2009-09-01 at 14:32 +0100, Metzger, Markus T wrote:

>> >This makes me wonder how much time it takes to drain these buffers, it
>> >is at all possible to optimize that code path into oblivion, or will
>> >nothing be fast enough?
>> Are you saying that we should rather speed up that code path than try to
>> defer all the work? There definitely is a lot of redundant work done on
>> the generic path.
>> I did a few experiments where I would drain only parts of the buffer.
>> I could not drain too much before the system would hang.
>> Besides, that does not sound too robust to me. Would it sill work on
>> a slower system? Or on a faster one? Or on a fully loaded one?
>Base cpu speed is what counts, load is not interesting.
>Also it seems a normalizing property, the slower the cpu the less
>branches it can process per time unit, so less data to process.
>But yes, I was suggesting to optimize this, since the current way of
>calling perf_counter_output() multiple times is massively bloated.

This seems to do the trick - at least on my box.

I prepare the header, then do a single perf_output_begin()/perf_output_end()
pair, and between those two, I drain the entire 2048 records BTS buffer -
pretty much the same way as perf_counter_output() does.

We could optimize this further by providing specialized draining functions,
one for each combination of PERF_SAMPLE_ bits, but it seems to be fast
enough the way it is.

Holding the output lock that long does not seem to be a problem.
I can do perf record -a -o /dev/null -e branches -c 1 and I don't get a
hrtimer warning in dmesg.

When I do a perf record -e branches -c 1 true in parallel, I do not get
any trace, though. And perf does not report an error, either.

I copied some of the generic sampling code; I'll try to restructure it
a bit so I can call a generic function to do the actual sampling - provided
this is still fast enough.

How would we make sure it works on other boxes, as well?
Is there a way for me to detect that I'm not handling the interrupt fast enough?

I found another "kernel hangs" bug that is reproducible with 'normal' profiling:

when I do sudo perf record -e instructions -c 1000000 -a -o /dev/null
then unplug and replug one cpu
then kill the perf record job
the kernel hangs

thanks and regards,

Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at