Re: [PATCH 1/2] Modify cpupower to schedule itself on cores it is reading MSRs from

From: Natarajan, Janakarajan
Date: Mon Sep 30 2019 - 11:34:14 EST


On 9/27/2019 1:59 PM, shuah wrote:
> On 9/18/19 10:34 AM, Natarajan, Janakarajan wrote:
>> Modify cpupower to schedule itself on each of the cpus in the system and
>> then get the APERF/MPERF register values.
>>
>> This is advantageous because an IPI is not generated when a
>> read_msr() is
>> executed on the local logical CPU thereby reducing the chance of having
>> APERF and MPERF being out of sync.
>
> Somehow this doesn't read right. Is this that you are trying to avoid
> APERF and MPERF being out of sync with this change?
>
> This description is rather confusing.


We are trying to avoid a separate IPI for APERF and MPERF. We have seen
that a separate IPI for

each can cause APERF and MPERF to go out of sync.


If the cpupower is already executing on the core it wants to get the
APERF/MPERF from, then

generic_exec_single() does not generate the IPI to execute the rdmsr().
Rather it just executes

on the current cpu.


I can change the description to add more detail.


>
>>
>> Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@xxxxxxx>
>> ---
>> Â .../utils/idle_monitor/mperf_monitor.cÂÂÂÂÂÂÂ | 38 ++++++++++++++-----
>> Â 1 file changed, 28 insertions(+), 10 deletions(-)
>>
>> diff --git a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
>> b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
>> index 44806a6dae11..8b072e39c897 100644
>> --- a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
>> +++ b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
>> @@ -10,6 +10,7 @@
>> Â #include <stdlib.h>
>> Â #include <string.h>
>> Â #include <limits.h>
>> +#include <sched.h>
>> Â Â #include <cpufreq.h>
>> Â @@ -86,15 +87,33 @@ static int mperf_get_tsc(unsigned long long *tsc)
>> ÂÂÂÂÂ return ret;
>> Â }
>> Â +static int get_aperf_mperf(int cpu, unsigned long long *aval,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned long long *mval)
>> +{
>> +ÂÂÂ cpu_set_t set;
>> +ÂÂÂ int ret;
>> +
>> +ÂÂÂ CPU_ZERO(&set);
>> +ÂÂÂ CPU_SET(cpu, &set);
>> +ÂÂÂ if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
>> +ÂÂÂÂÂÂÂ dprint("Could not migrate to cpu: %d\n", cpu);
>> +ÂÂÂÂÂÂÂ return 1;
>> +ÂÂÂ }
>> +
>> +ÂÂÂ ret = read_msr(cpu, MSR_APERF, aval);
>> +ÂÂÂ ret |= read_msr(cpu, MSR_MPERF, mval);
>> +
>> +ÂÂÂ return ret;
>> +}
>> +
>> Â static int mperf_init_stats(unsigned int cpu)
>> Â {
>> -ÂÂÂ unsigned long long val;
>> +ÂÂÂ unsigned long long aval, mval;
>> ÂÂÂÂÂ int ret;
>> Â -ÂÂÂ ret = read_msr(cpu, MSR_APERF, &val);
>> -ÂÂÂ aperf_previous_count[cpu] = val;
>> -ÂÂÂ ret |= read_msr(cpu, MSR_MPERF, &val);
>> -ÂÂÂ mperf_previous_count[cpu] = val;
>> +ÂÂÂ ret = get_aperf_mperf(cpu, &aval, &mval);
>
> get_aperf_mperf() could return error right? It returns 1 when
> sched_setaffinity() fails. Shouldn't the return value checked,
> instead of using aval and mval?


We set the is_valid[cpu] to the return value. Later on the is_valid[cpu]

is checked before proceeding to calculate the effective freq.


Thanks.


>
>> +ÂÂÂ aperf_previous_count[cpu] = aval;
>> +ÂÂÂ mperf_previous_count[cpu] = mval;
>> ÂÂÂÂÂ is_valid[cpu] = !ret;
>> Â ÂÂÂÂÂ return 0;
>> @@ -102,13 +121,12 @@ static int mperf_init_stats(unsigned int cpu)
>> Â Â static int mperf_measure_stats(unsigned int cpu)
>> Â {
>> -ÂÂÂ unsigned long long val;
>> +ÂÂÂ unsigned long long aval, mval;
>> ÂÂÂÂÂ int ret;
>> Â -ÂÂÂ ret = read_msr(cpu, MSR_APERF, &val);
>> -ÂÂÂ aperf_current_count[cpu] = val;
>> -ÂÂÂ ret |= read_msr(cpu, MSR_MPERF, &val);
>> -ÂÂÂ mperf_current_count[cpu] = val;
>> +ÂÂÂ ret = get_aperf_mperf(cpu, &aval, &mval);
>
> Same comments as above here.
>
>> +ÂÂÂ aperf_current_count[cpu] = aval;
>> +ÂÂÂ mperf_current_count[cpu] = mval;
>> ÂÂÂÂÂ is_valid[cpu] = !ret;
>> Â ÂÂÂÂÂ return 0;
>>
>
> thanks,
> -- Shuah