Re: BUG during shutdown - bisected to commit e2912009

From: Marc Dionne
Date: Mon Jan 04 2010 - 22:23:29 EST


On Mon, Jan 4, 2010 at 9:56 PM, Xiaotian Feng <dfeng@xxxxxxxxxx> wrote:
> On 01/05/2010 02:43 AM, Marc Dionne wrote:
>>
>> On Fri, Jan 1, 2010 at 7:42 PM, Peter Zijlstra<peterz@xxxxxxxxxxxxx>
>>  wrote:
>>>
>>> On Fri, 2010-01-01 at 19:27 -0500, Marc Dionne wrote:
>>>>
>>>> I'm getting a BUG with current kernels from
>>>> kernel/time/clockevents.c:263 when halting the system - a restart
>>>> behaves normally.  I don't have a good camera handy at the moment to
>>>> capture the call stack on screen, but the call sequence is:
>>>>
>>>> clockevents_notify
>>>> hrtimer_cpu_notify
>>>> notifier_call_chain
>>>> raw_notifier_call_chain
>>>> _cpu_down
>>>> disable_nonboot_cpus
>>>> kernel_power_off
>>>> sys_reboot
>>>>
>>>> I bisected it down to commit e2912009: sched: Ensure set_task_cpu() is
>>>> never called on blocked tasks.  There were a few commits tested along
>>>> the way where I got a freeze (with the power still on) instead of a
>>>> BUG. Reverting that commit from the current kernel doesn't look
>>>> trivial, but the commit immediately preceding this one does halt fine.
>>>
>>> We somehow seem to trip up the below patch, which doesn't really make
>>> sense, as I can't find how task placement would affect the below error.
>>>
>>> It seems to purely test against the hot-unplugged cpu, not a cpu the
>>> task is running on.
>>>
>>> ---
>>> commit bb6eddf7676e1c1f3e637aa93c5224488d99036f
>>> Author: Thomas Gleixner<tglx@xxxxxxxxxxxxx>
>>> Date:   Thu Dec 10 15:35:10 2009 +0100
>>
>> Probably predictable but worth testing, reverting that patch does
>> allow my system to shutdown cleanly.
>
> That BUG_ON was removed by reverting that patch, so you can shutdown
> cleanly.
>
> Could you please attach you kernel config file? I'm a little confused about
> how do you revert e2912009, manually? I can't see any connections between
> e2912009 and bb6eddf7, could you please show me your timer list (cat
> /proc/timer_list)

config is attached, and the output of cat /proc/timers is also
attached (it's rather large).

To recap:
- Reverting bb6eddf7 gives me a clean shutdown - predictable of course
since it removes the BUG_ON
- I wasn't able to trivially revert e2912009 from a current kernel.
But it fails to shutdown while the preceding commit is OK.

So it would seem that e2912009 is triggering something that the check
in bb6eddf7 is catching.

With more recent kernels (but not the ones around e2912009), I do get
these timer-related warnings in dmesg (and briefly on screen) :

PCSP: Timer resolution is not sufficient (999848nS)
PCSP: Make sure you have HPET and ACPI enabled.
PCSP: Turned into nopcm mode.

Marc

Attachment: .config
Description: Binary data

Attachment: timers
Description: Binary data