[BUG] perf: multiplexing and hotplug CPU problem

From: Stephane Eranian
Date: Wed Sep 12 2012 - 10:40:51 EST


As I was debugging my hrtimer patch, I ran a few tests
with hotplug CPU. In others words, I offline a CPU while
there is an active monitoring session which causes multiplexing.

When the CPU goes down, all is well. But when it comes back,
things go wrong. No kernel crashes but wrong results and multiplexing
does not work anymore.

I investigated this some more and found out there is an issue
on re-activation.

During shutdown, system-wide events are scheduled out AND removed
from the event lists. Consequently, ctx->nr_events and ctx->nr_active
go to zero.

When the CPU is brought back online and tools do start/stop on the events
they can be scheduled back in, and therefore increment ctx->nr_active.
Because list_add_event() is not called again, you may end up with
ctx->nr_events < ctx->nr_active which is wrong. Events may not
be a lists and therefore they cannot get multiplexed again.

It is not clear to me why we need to remove the events from any
list (list_del_event) when the CPU goes down.

Why isn't calling event_sched_out() enough?
If events are kept on lists, why not try to schedule them back in when
the CPU is brought back online?
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/