Re: [RFC PATCH 0/3] perf: show package power consumption in perf

From: Peter Zijlstra
Date: Wed Aug 18 2010 - 08:26:26 EST


On Wed, 2010-08-18 at 15:59 +0800, Zhang Rui wrote:
> Hi, all,
>
> RAPL(running average power limit) is a new feature which provides
> mechanisms to enforce power consumption limit, on some new processors.
>
> Generally speaking, by using RAPL, OS can set a power budget in a
> certain time window, and let Hardware to throttle the processor
> P/T-state to meet this energy limitation.
>
> RAPL also provides a new MSR, i.e. MSR_PKG_ENERGY_STATUS, which reports
> the total amount of energy consumed by the package.
>
> I'm not sure if to support RAPL or not, but anyway, it sounds like a
> good idea to export the energy status in perf.
>
> So a new perf pmu and event to show the package energy consumed is
> introduced in this patch.
>
> Here is what I get after applying the three patches,
>
> #./perf stat -e energy test
> Performance counter stats for 'test':
>
> 202 Joules cost by package
> 7.926001238 seconds time elapsed
>
>
> Note that this patch set is made based on Peter's perf-pmu branch,
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git
> which provides better interfaces to register/unregister a new pmu.
>
> any comment are welcome. :)


Nice,.. however:

- if it is a pure read-only counter without sampling support,
expose it as such, don't fudge in the hrtimer stuff. Simply
fail to create a sampling event.

SH has the same problem for its 'normal' PMU, the solution is
to use event groups, Matt was looking at adding support to
perf-record for that, if creating a sampling event fails, fall
back to {hrtimer, $event} groups.

- since its a free-running, non-configurable counter, you can indeed
act like its a 'software' event in that you can schedule consumers
without constraints, however I don't think the PERF_COUNT_SW_* space
is the right way to expose this counter.

Better would be to use the sysfs stuff Lin has been working on (for
which I still need to catch up on the latest discussions), it would
then be tied to the pmu instance and appear/disappear when you load/
unload the module.

However for testing purposes I see why you'd want to have _a_
interface :-)

- it would be nice if you'd write the cpu detection a bit more readable,
also, it looks like you forgot to check x86_vendor == X86_VENDOR_INTEL.

> +static int __init intel_rapl_init(void)
> +{
> + /*
> + * RAPL features are only supported on processors have a CPUID
> + * signature with DisplayFamily_DisplayModel of 06_2AH, 06_2DH
> + */
> + if (boot_cpu_data.x86 != 0x06 ||
> + (boot_cpu_data.x86_model != 0x2A &&
> + boot_cpu_data.x86_model != 0x2D))
> + return -ENODEV;
> +
> + if (rapl_check_unit())
> + return -ENODEV;
> +
> + perf_pmu_register(&rapl_pmu);
> + return 0;
> +}

Maybe something like (see intel_pmu_init() for example):

if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return -ENODEV;

if (boot_cpu_data.x86 != 0x06)
return -ENODEV;

switch (boot_cpu_data.x86_model) {
case 0x2A: /* sandybridge ?! 32nm */
case 0x2D: /* othermodel 32nm */
break;

default:
return -ENODEV;
}

Which again reminds me to ask of Intel, a comprehensive x86_model list,
please?

Alternatively, you can create a X86_FEATURE_RAPL and simply use
boot_cpu_has(X86_FEATURE_RAPL) (much like intel_ds_init() has).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/