On Fri, Aug 16, 2019 at 10:49:10AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
perf stat -M metrics relies on weak groups to reject unschedulable
groups and run them as non-groups.
This uses the group validation code in the kernel. Unfortunately
that code doesn't take pinned events, such as the NMI watchdog, into
account. So some groups can pass validation, but then later still
never schedule.
But if you first create the group and then a pinned event it 'works',
which is inconsistent and makes all this timing dependent.
@@ -2011,9 +2011,11 @@ static int validate_event(struct perf_event *event)
*/
static int validate_group(struct perf_event *event)
{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct perf_event *leader = event->group_leader;
struct cpu_hw_events *fake_cpuc;
- int ret = -EINVAL, n;
+ struct perf_event *pinned_event;
+ int ret = -EINVAL, n, i;
fake_cpuc = allocate_fake_cpuc();
if (IS_ERR(fake_cpuc))
@@ -2033,6 +2035,24 @@ static int validate_group(struct perf_event *event)
if (n < 0)
goto out;
+ /*
+ * The new group must can be scheduled
+ * together with current pinned events.
+ * Otherwise, it will never get a chance
+ * to be scheduled later.
That's wrapped short; also I don't think it is sufficient; what if you
happen to have a pinned event on CPU1 (and not others) and happen to run
validation for a new CPU1 event on CPUn ?
Also; per that same; it is broken, you're accessing the cpu-local cpuc
without serialization.
+ */
+ for (i = 0; i < cpuc->n_events; i++) {
+ pinned_event = cpuc->event_list[i];
+ if (WARN_ON_ONCE(!pinned_event))
+ continue;
+ if (!pinned_event->attr.pinned)
+ continue;
+ fake_cpuc->n_events = n;
+ n = collect_events(fake_cpuc, pinned_event, false);
+ if (n < 0)
+ goto out;
+ }
+
fake_cpuc->n_events = 0;
ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
--
2.7.4