Re: [PATCH] perf/rapl: restart perf rapl counter after resume

From: Liang, Kan
Date: Thu Jun 20 2019 - 10:39:08 EST




On 6/20/2019 8:50 AM, Peter Zijlstra wrote:
On Mon, Jun 17, 2019 at 09:41:37PM +0800, Zhang Rui wrote:

After S3 suspend/resume, "perf stat -I 1000 -e power/energy-pkg/ -a"
reports an insane value for the very first sampling period after resume
as shown below.

ÂÂÂÂ19.278989977ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ2.16 Joules power/energy-pkg/
ÂÂÂÂ20.279373569ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ1.96 Joules power/energy-pkg/
ÂÂÂÂ21.279765481ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ2.09 Joules power/energy-pkg/
ÂÂÂÂ22.280305420ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ2.10 Joules power/energy-pkg/
ÂÂÂÂ25.504782277ÂÂÂ4,294,966,686.01 Joules power/energy-pkg/
ÂÂÂÂ26.505114993ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ3.58 Joules power/energy-pkg/
ÂÂÂÂ27.505471758ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ1.66 Joules power/energy-pkg/

Fix this by resetting the counter right after resume.

Cute...


+#ifdef CONFIG_PM
+
+static int perf_rapl_suspend(void)
+{
+ int i;
+
+ get_online_cpus();
+ for (i = 0; i < rapl_pmus->maxpkg; i++)
+ rapl_pmu_update_all(rapl_pmus->pmus[i]);
+ put_online_cpus();
+ return 0;
+}
+
+static void perf_rapl_resume(void)
+{
+ int i;
+
+ get_online_cpus();
+ for (i = 0; i < rapl_pmus->maxpkg; i++)
+ rapl_pmu_restart_all(rapl_pmus->pmus[i]);
+ put_online_cpus();
+}

What's the reason for that get/put_online_cpus() here ?


It looks like syscore_* functions are executed with one CPU on-line.
If so, they may not be the right place for the rapl callback.

Rapl is per socket. The driver manipulates the registers on the first CPU of each socket. I think we need to update/restart the counters on all sockets. That's the reason I add get/put_online_cpus() in the original patch.

Besides, I think we also need to call rapl_pmu_restart/update_all() on the target cpu.


Thanks,
Kan