>> My current theory is that the BTS buffer fills up so quickly when tracing
>> the kernel, that the kernel is busy handling overflows and reacting on
>> other interrupts that pile up while we're handling the BTS overflow.
>> When I trace user-mode branches, it works.
>> When I do not copy the trace during overflow handling, the kernel does not hang.
>Agreed, that was my suspicion as well. Would you happen to know where to
>get these USB debug port cables, and how to find out if a machine
>supports this?

I'm sorry but I don't understand what you mean with "these USB debug port cables".

>> I do need 3 buffers of 2048 entries = 3x48 pages per cpu, though.
>And those pages have to be contiguous too, right? That's an order-6
>alloc, painful.

According to an earlier discussion with Roland, they don't have to.
They still need to be locked, though.
According to some other discussion with Andrew and Ingo, I still use
kmalloc to allocate those buffers.

>> One buffer
>> to switch in during overflow handling; another to switch in during sched_out
>> (assuming that we need to schedule out the traced task before we may start
>> the draining task). Even then, there's a chance that we will lose trace
>> when the draining task may not start immediately. I would even say that
>> this is quite likely.
>Right, is it possible to detect this loss?

It is. But in order to get the PERF_EVENT_LOST record into the correct place,
I need to defer logging the lost trace;-)

And we would loose this very nice feature of fixed-size entries.

>This makes me wonder how much time it takes to drain these buffers, it
>is at all possible to optimize that code path into oblivion, or will
>nothing be fast enough?

Are you saying that we should rather speed up that code path than try to
defer all the work? There definitely is a lot of redundant work done on
the generic path.

I did a few experiments where I would drain only parts of the buffer.
I could not drain too much before the system would hang.
Besides, that does not sound too robust to me. Would it sill work on
a slower system? Or on a faster one? Or on a fully loaded one?

>> What I do not have, yet, is the actual draining. Draining needs to start
>> after the counter has been disabled. But draining needs the perf_counter
>> to drain the trace into. The counter will thus be busy after it has been
>> disabled - ugly.
>Yes, this is a tad weird...

Hmmm, since the counter is removed during sched_out, and I can't drain the
buffer in x86_pmu_perf_disable(), I'm afraid we don't have much choice.

>> There already seems to be something in place regarding deferring work, i.e.
>> perf_counter_do_pending(). Would it be OK if I added the deferred BTS buffer
>> draining to that?
>Yes, note that this pending work runs from hardirq context as well. On
>x86 we self-ipi to get into IRQ context ASAP after the NMI.
>So if the remote cpu is blocked waiting on an SMP call, doing the work
>from hardirq context won't really help things.

I can't use that, then.

When I use schedule_work() instead, how would I ensure that the work is done
before the traced (or tracing) task is rescheduled?

I would need to ensure that the counter does not go away as long as draining
work is scheduled. I would store a pointer to the counter in the work struct.
Should perf_counter's be use-counted?

>> Looks like this would guarantee that the counter does not go away as long
>> as there is work pending. Is this correct?
>Agreed, it waits for all pending bits to complete before destroying the

That would be an alternative to use-counting.

>> In any case, this is getting late for the upcoming merge window.
>> Would you rather drop the BTS patch or disable kernel tracing?
>I don't think we need to drop it, at worst we could defer the patch
>to .33, but I think we can indeed get away with disabling the kernel
>tracing for now.

There's some more review feedback from you that I have not integrated, yet.
One is that BTS should return an error instead of falling back to generic
Another is that BTS does not provide a counter value.

How important are those?

thanks and regards,

