RE: x86_mce: mce_start uses number of phsical cores instead oflogical cores

From: Ming Lei
Date: Fri May 10 2013 - 14:42:36 EST


With hyperthread turns on, the num_online_cpus reports the number of all logical cores. What I found in testing is only half the cores receives the mce broadcast, so I assume only the physical cores get broadcast. I have two sockets 5646 onboard. num_online_cpus() returns 24 and I only get 12 cores enter do_machine_check. I used both edac error injection and hardware edac error injector as well in my testing.

cpumask_weight(cpu_core_mask(0)) / cpu_data(0).booted_cores returns the ratio between logical cores and physical cores. In my case it is two.

Here is intel spec:
Processor Number E5645
# of Cores 6
# of Threads 12

Ming

-----Original Message-----
From: Luck, Tony [mailto:tony.luck@xxxxxxxxx]
Sent: Friday, May 10, 2013 11:14 AM
To: Ming Lei; linux-kernel@xxxxxxxxxxxxxxx
Cc: mchehab@xxxxxxxxxx; bp@xxxxxxxxx
Subject: RE: x86_mce: mce_start uses number of phsical cores instead of logical cores

> +#if NR_CPUS > 1
> + cpus /= cpumask_weight(cpu_core_mask(0)) / cpu_data(0).booted_cores;
> +#endif

Not entirely sure what you are trying to do here (apart from making "cpus"
be a smaller number). What is the reasoning behind the right hand side of this expression?

Is this problem more related to how EDAC is injecting an error? When I've used other methods (e.g. ACPI/EINJ) I end up with a machine check that is broadcast to all processors ... so "cpus = num_online_cpus()" is the correct[1] number of processors to wait for.

-Tony

[1] Andi may point me (again) to a fix to help deal with the case that Linux has taken some cpus offline. In that case this code is wrong as the "offline"
cpus will still show up for machine checks. But there are troubling corner cases with the fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/