Re: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang

From: Jeremy Fitzhardinge
Date: Mon Jun 25 2007 - 08:41:36 EST


Ingo Molnar wrote:
* Ingo Molnar <mingo@xxxxxxx> wrote:

hm, restoring nmi.c to the v2.6.21 state does not fix the nmi_watchdog=2 hang. I'll do a bisection run.

and after spending an hour on 15 bisection steps:

git-bisect start
git-bisect good d1be341dba5521506d9e6dccfd66179080705bea
git-bisect bad a06381fec77bf88ec6c5eb6324457cb04e9ffd69
git-bisect bad 794543a236074f49a8af89ef08ef6a753e4777e5
git-bisect good 24a77daf3d80bddcece044e6dc3675e427eef3f3
git-bisect bad ea62ccd00fd0b6720b033adfc9984f31130ce195
git-bisect good 7e20ef030dde0e52dd5a57220ee82fa9facbea4e
git-bisect bad f19cccf366a07e05703c90038704a3a5ffcb0607
git-bisect good 0d08e0d3a97cce22ebf80b54785e00d9b94e1add
git-bisect bad 856f44ff4af6e57fdc39a8b2bec498c88438bd27
git-bisect bad f8822f42019eceed19cc6c0f985a489e17796ed8
git-bisect good 1c3d99c11c47c8a1a9ed6a46555dbf6520683c52
git-bisect good b239fb2501117bf3aeb4dd6926edd855be92333d
git-bisect good 98de032b681d8a7532d44dfc66aa5c0c1c755a9d
git-bisect good 42c24fa22e86365055fc931d833f26165e687c19

the winner is ...

f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit
commit f8822f42019eceed19cc6c0f985a489e17796ed8
Author: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Date: Wed May 2 19:27:14 2007 +0200

[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable

... our wonderful paravirt subsystem, honed to eternal perfection by the testing-machine x86_64 tree.

reverting -git-curr's paravirt.c, paravirt.h, smp.c and tlbflush.h to before the bad commit makes the NMI watchdog work again. Patch against -rc6 is below.

Er, wow. I've been running with this stuff for months without a problem. Do you have CONFIG_PARAVIRT enabled? Do you still get the hang if you boot with "noreplace-paravirt" to disable the patching?

Your revert patch seems to take out quite a lot of stuff, some unrelated to the paravirt_ops. Where did that come from?

I presume there's one bad callsite in here which is used by the nmi path more or less exclusively. Is the bug simply that it hangs if you boot with nmi_watchdog=2? ie, no other details?

@@ -222,10 +211,30 @@ void send_IPI_mask_sequence(cpumask_t ma
*/ local_irq_save(flags);
+
for (query_cpu = 0; query_cpu < NR_CPUS; ++query_cpu) {
if (cpu_isset(query_cpu, mask)) {
- __send_IPI_dest_field(cpu_to_logical_apicid(query_cpu),
- vector);
+
+ /*
+ * Wait for idle.
+ */
+ apic_wait_icr_idle();
+
+ /*
+ * prepare target chip field
+ */
+ cfg = __prepare_ICR2(cpu_to_logical_apicid(query_cpu));
+ apic_write_around(APIC_ICR2, cfg);
+
+ /*
+ * program the ICR + */
+ cfg = __prepare_ICR(0, vector);
+
+ /*
+ * Send the IPI. The write to APIC_ICR fires this off.
+ */
+ apic_write_around(APIC_ICR, cfg);
}
}
local_irq_restore(flags);

What's this? This isn't paravirt_ops related, is it?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/