On Tue, Mar 03, 2015 at 02:31:51PM -0800, Paul E. McKenney wrote:
On Tue, Mar 03, 2015 at 05:06:50PM -0500, Boris Ostrovsky wrote:And the code for this, in xen_cpu_up(), might look something like the
On 03/03/2015 04:26 PM, Paul E. McKenney wrote:Another strategy is to key off of the return value of cpu_check_up_prepare().
On Tue, Mar 03, 2015 at 03:13:07PM -0500, Boris Ostrovsky wrote:Yes.
On 03/03/2015 02:42 PM, Paul E. McKenney wrote:OK, so I have the following patch on top of my previous patch, which
On Tue, Mar 03, 2015 at 02:17:24PM -0500, Boris Ostrovsky wrote:else
On 03/03/2015 12:42 PM, Paul E. McKenney wrote:So something like this, then?
}Not for HVM guests (PV guests will only reach this point after
@@ -511,7 +508,8 @@ static void xen_cpu_die(unsigned int cpu)
schedule_timeout(HZ/10);
}
- cpu_die_common(cpu);
+ (void)cpu_wait_death(cpu, 5);
+ /* FIXME: Are the below calls really safe in case of timeout? */
target cpu has been marked as down by the hypervisor).
We need at least to have a message similar to what native_cpu_die()
prints on cpu_wait_death() failure. And I think we should not call
the two routines below (three, actually --- there is also
xen_teardown_timer() below, which is not part of the diff).
-boris
xen_smp_intr_free(cpu);
xen_uninit_lock_cpu(cpu);
if (cpu_wait_death(cpu, 5)) {
xen_smp_intr_free(cpu);
xen_uninit_lock_cpu(cpu);
xen_teardown_timer(cpu);
}
pr_err("CPU %u didn't die...\n", cpu);
Easy change for me to make if so!I believe PV VCPUs will always be CPU_DEAD by the time we get here
Or do I need some other check for HVM-vs.-PV guests, and, if so, what
would that check be? And also if so, is it OK to online a PV guest's
CPU that timed out during its previous offline?
since we are (indirectly) waiting for this in the loop at the
beginning of xen_cpu_die():
'while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu,
NULL))' will exit only after 'HYPERVISOR_vcpu_op(VCPUOP_down,
smp_processor_id()' in xen_play_dead(). Which happens after
play_dead_common() has marked the cpu as CPU_DEAD.
So no test is needed.
I will merge if testing goes well. So if a CPU times out going offline,
the above three functions will not be called, the "didn't die" message
will be printed, and any future attempt to online that CPU will fail.
Is that the correct semantics?
I am not sure whether not ever onlining the CPU is the best outcome
but then I don't think trying to online it again with all interrupts
and such still set up will work well. And it's an improvement over
what we have now anyway (with current code we may clean up things
for a non-dead cpu).
If it returns -EBUSY, then the outgoing CPU finished up after the
surviving CPU timed out. The CPU trying to bring the new CPU online
could (in theory, anyway) do the xen_smp_intr_free(), xen_uninit_lock_cpu(),
and xen_teardown_timer() at that point.
following:
rc = cpu_check_up_prepare(cpu);
if (rc && rc != -EBUSY)
return rc;
if (rc == EBUSY) {
xen_smp_intr_free(cpu);
xen_uninit_lock_cpu(cpu);
xen_teardown_timer(cpu);
}
The idea is that we detect when the CPU eventually took itself offline,
but only did so after the surviving CPU timed out. (Of course, it
would probably be best to put those three statements into a small
function that is called from both places.)
I have no idea whether this approach would really work, especially given
your earlier statement that CPU_DEAD happens early on. But in case it
is helpful or sparks some better idea.
Thanx, Paul
But I must defer to you on this sort of thing.
Thanx, Paul
Thanks.
-boris
Thanx, Paul
------------------------------------------------------------------------
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index e2c7389c58c5..f2a06ff0614d 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -508,12 +508,13 @@ static void xen_cpu_die(unsigned int cpu)
schedule_timeout(HZ/10);
}
- (void)cpu_wait_death(cpu, 5);
- /* FIXME: Are the below calls really safe in case of timeout? */
-
- xen_smp_intr_free(cpu);
- xen_uninit_lock_cpu(cpu);
- xen_teardown_timer(cpu);
+ if (cpu_wait_death(cpu, 5)) {
+ xen_smp_intr_free(cpu);
+ xen_uninit_lock_cpu(cpu);
+ xen_teardown_timer(cpu);
+ } else {
+ pr_err("CPU %u didn't die...\n", cpu);
+ }
}
static void xen_play_dead(void) /* used only with HOTPLUG_CPU */