Re: BUG during shutdown - bisected to commit e2912009

From: Xiaotian Feng
Date: Tue Jan 05 2010 - 05:18:50 EST


On 01/05/2010 11:23 AM, Marc Dionne wrote:
On Mon, Jan 4, 2010 at 9:56 PM, Xiaotian Feng<dfeng@xxxxxxxxxx> wrote:
On 01/05/2010 02:43 AM, Marc Dionne wrote:

On Fri, Jan 1, 2010 at 7:42 PM, Peter Zijlstra<peterz@xxxxxxxxxxxxx>
wrote:

On Fri, 2010-01-01 at 19:27 -0500, Marc Dionne wrote:

I'm getting a BUG with current kernels from
kernel/time/clockevents.c:263 when halting the system - a restart
behaves normally. I don't have a good camera handy at the moment to
capture the call stack on screen, but the call sequence is:

clockevents_notify
hrtimer_cpu_notify
notifier_call_chain
raw_notifier_call_chain
_cpu_down
disable_nonboot_cpus
kernel_power_off
sys_reboot

I bisected it down to commit e2912009: sched: Ensure set_task_cpu() is
never called on blocked tasks. There were a few commits tested along
the way where I got a freeze (with the power still on) instead of a
BUG. Reverting that commit from the current kernel doesn't look
trivial, but the commit immediately preceding this one does halt fine.

We somehow seem to trip up the below patch, which doesn't really make
sense, as I can't find how task placement would affect the below error.

It seems to purely test against the hot-unplugged cpu, not a cpu the
task is running on.

---
commit bb6eddf7676e1c1f3e637aa93c5224488d99036f
Author: Thomas Gleixner<tglx@xxxxxxxxxxxxx>
Date: Thu Dec 10 15:35:10 2009 +0100

Probably predictable but worth testing, reverting that patch does
allow my system to shutdown cleanly.

That BUG_ON was removed by reverting that patch, so you can shutdown
cleanly.

Could you please attach you kernel config file? I'm a little confused about
how do you revert e2912009, manually? I can't see any connections between
e2912009 and bb6eddf7, could you please show me your timer list (cat
/proc/timer_list)

config is attached, and the output of cat /proc/timers is also
attached (it's rather large).

To recap:
- Reverting bb6eddf7 gives me a clean shutdown - predictable of course
since it removes the BUG_ON
- I wasn't able to trivially revert e2912009 from a current kernel.
But it fails to shutdown while the preceding commit is OK.

So it would seem that e2912009 is triggering something that the check
in bb6eddf7 is catching.

With more recent kernels (but not the ones around e2912009), I do get
these timer-related warnings in dmesg (and briefly on screen) :

PCSP: Timer resolution is not sufficient (999848nS)
PCSP: Make sure you have HPET and ACPI enabled.
PCSP: Turned into nopcm mode.

This is outputed by sound module, but it will not affect clockevents, could you please try following patch and let me know the output before BUG_ON happens? We can gather more information on the BUG_ON. Thank you.

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 6f740d9..7c945e8 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -260,6 +260,9 @@ void clockevents_notify(unsigned long reason, void *arg)
list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
if (cpumask_test_cpu(cpu, dev->cpumask) &&
cpumask_weight(dev->cpumask) == 1) {
+ if (dev->mode != CLOCK_EVT_MODE_UNUSED)
+ printk("invalid dev %s mode %d on cpu %d\n", dev->name,
+ dev->mode, cpu);
BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
list_del(&dev->list);

Marc

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/