[Patch V1 3/3] x86, mce: Account for offline CPUs during MCE rendezvous.

From: Ashok Raj
Date: Thu Sep 24 2015 - 00:50:40 EST

Next message: Paul E. McKenney: "Re: crisv32 runtime failure in -next due to 'page-flags: define behavior SL*B-related flags on compound pages'"
Previous message: Ashok Raj: "[Patch V1 1/3] x86, mce: MCE log size not enough for high core parts"
In reply to: Ashok Raj: "[Patch V1 2/3] x86, mce: Refactor parts of mce_log() to reuse when logging from offline CPUs"
Next in thread: Borislav Petkov: "Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Linux has logical CPU offline, supported as shown below.

#echo 0 > /sys/devices/system/cpu/cpuX/online

Hardware doesn't know about CPUs offlined by the OS, hence hardware will
continue broadcast any MCE to all CPUs in the system, which includes
CPUs offlined. Hence mce_start() and mce_end() should use cpu_present_map to
count CPUs in rendezvous. CPUs Offlined by OS are also in the MCE domain,
so they also have to process int18 handlers. Since current code only accounts
for CPUs online. This will result in cpu_callin being higher by the number
of of CPUs offined.

The main benefit is in the odd case the offline CPU is the source of
the MCE, kernel will be able to capture logs properly even for offline
CPUs.

This patch does the following.

- Allow offline CPUs to participate in the MCE rendezvous process.
- Ensure the offline CPU will not be choosen as the rendezvous master CPU
- Collect logs from the offline cpu and report them via rendezvous master.

Signed-off-by: Ashok Raj <ashok.raj@xxxxxxxxx>
Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
---
arch/x86/kernel/cpu/mcheck/mce.c | 32 ++++++++++++++++++++++++++------
1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2df073d..080eefe 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -195,8 +195,6 @@ static void mce_log_add(struct mce *mce)

void mce_log(struct mce *mce)
{
- unsigned next, entry;
-
/* Emit the trace record: */
trace_mce_record(mce);

@@ -756,8 +754,14 @@ static void mce_reign(void)
* This CPU is the Monarch and the other CPUs have run
* through their handlers.
* Grade the severity of the errors of all the CPUs.
+ * Intel CPUs broadcast MCEs to all CPUs booted. Even if they are
+ * parked in idle due to logical CPU offline. Hence we should count
+ * all CPUs to process MCEs.
+ * Intel CPUs broadcsat MCEs to all CPUs booted. Even if they are
+ * parked in idle due to logical CPU offline. Hence we should count
+ * all CPUs to process MCEs.
*/
- for_each_possible_cpu(cpu) {
+ for_each_present_cpu(cpu) {
int severity = mce_severity(&per_cpu(mces_seen, cpu),
mca_cfg.tolerant,
&nmsg, true);
@@ -809,8 +813,9 @@ static atomic_t global_nwo;
static int mce_start(int *no_way_out)
{
int order;
- int cpus = num_online_cpus();
+ int cpus = num_present_cpus();
u64 timeout = (u64)mca_cfg.monarch_timeout * NSEC_PER_USEC;
+ unsigned int this_cpu = smp_processor_id();

if (!timeout)
return -1;
@@ -820,6 +825,16 @@ static int mce_start(int *no_way_out)
* global_nwo should be updated before mce_callin
*/
smp_wmb();
+
+ /*
+ * If this cpu is offline, make sure it won't be elected as
+ * the rendezvous master
+ */
+ if (cpu_is_offline(this_cpu)) {
+ while (!atomic_read(&mce_callin))
+ ndelay(SPINUNIT);
+ }
+
order = atomic_inc_return(&mce_callin);

/*
@@ -890,7 +905,7 @@ static int mce_end(int order)

if (order == 1) {
/* CHECKME: Can this race with a parallel hotplug? */
- int cpus = num_online_cpus();
+ int cpus = num_present_cpus();

/*
* Monarch: Wait for everyone to go through their scanning
@@ -984,6 +999,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
int i;
int worst = 0;
int severity;
+ unsigned int cpu = smp_processor_id();
+
/*
* Establish sequential order between the CPUs entering the machine
* check handler.
@@ -1098,7 +1115,10 @@ void do_machine_check(struct pt_regs *regs, long error_code)
m.severity = severity;
m.usable_addr = mce_usable_address(&m);

- mce_log(&m);
+ if (cpu_is_offline(cpu))
+ mce_log_add(&m);
+ else
+ mce_log(&m);

if (severity > worst) {
*final = m;
--
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Paul E. McKenney: "Re: crisv32 runtime failure in -next due to 'page-flags: define behavior SL*B-related flags on compound pages'"
Previous message: Ashok Raj: "[Patch V1 1/3] x86, mce: MCE log size not enough for high core parts"
In reply to: Ashok Raj: "[Patch V1 2/3] x86, mce: Refactor parts of mce_log() to reuse when logging from offline CPUs"
Next in thread: Borislav Petkov: "Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]