Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()

From: Charles (Chas) Williams
Date: Thu Oct 27 2016 - 15:00:49 EST


On 10/25/2016 08:22 AM, Sebastian Andrzej Siewior wrote:
On 2016-10-21 17:03:56 [-0400], Charles (Chas) Williams wrote:
[ 3.107126] init_rapl_pmus: maxpkg 4
there! vmware bug. It probably worked by chance.

Yes, the behavior is a bit random.

I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning
topology_max_packages(). So it says 4 but then returns 65535 for CPU 2
and 3. That -1 comes probably from topology_update_package_map(). Could
you please send a complete boot log and try the following patch? This
one should fix your boot problem and disable RAPL if the info is
invalid.

But sometimes the topology info is correct and if I get lucky, the
package id could be valid for all the CPU's. Given the behavior,
I have seen so far it makes me thing the RAPL isn't being emulated.
So even if I did boot onto a "valid" set of cores, would I always be
certain that I will be on those cores?

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 0a535cea8ff3..f5d85f2853d7 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void)
{
int maxpkg = topology_max_packages();
size_t size;
+ unsigned int cpu;
+
+ for_each_possible_cpu(cpu) {
+ if (topology_logical_package_id(cpu) >= maxpkg) {
+ pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n",
+ maxpkg, cpu, topology_logical_package_id(cpu));
+ return -EINVAL;
+ }
+ }

size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *);
rapl_pmus = kzalloc(size, GFP_KERNEL);

Per your request in your next email:

One thing I forgot to ask: Could you please check if you get the same
pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
rework).

Our previous kernel was 4.4, and didn't use the logical package id:

/* check if phys_is is already covered */
for_each_cpu(i, &rapl_cpu_mask) {
if (phys_id == topology_physical_package_id(i))
return;