Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH]acpi : remove power from acpi_processor_cx structure)

From: Daniel Lezcano
Date: Mon Sep 10 2012 - 15:45:13 EST


On 09/10/2012 07:14 PM, John Stultz wrote:
> On 09/07/2012 02:35 PM, Daniel Lezcano wrote:
>> On 09/07/2012 07:22 PM, John Stultz wrote:
>>> On 09/07/2012 07:20 AM, Daniel Lezcano wrote:
>>>> On 09/06/2012 11:18 PM, Rafael J. Wysocki wrote:
>>>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>>>> On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
>>>>>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>>>>>> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
>>>>>>>> I fall into this issue because NETCONSOLE is set, disabling it
>>>>>>>> allowed
>>>>>>>> me to go further.
>>>>>>>>
>>>>>>>> Unfortunately I am facing to some random freeze on the system
>>>>>>>> which
>>>>>>>> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
>>>>>>>>
>>>>>>>> Disabling one of them, make the freezes to disappear.
>>>>>>>>
>>>>>>>> Is it a known issue ?
>>>>>>> Well, there are systems having problems with this configuration,
>>>>>>> but they
>>>>>>> should be exceptional. What system is that?
>>>>>> It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
>>>>>> believe. Maybe someone got the same issue ?
>>>>> Is it a regression for you?
>>>> Yes, I think so. The issue appears between v3.5 and v3.6-rc1.
>>>>
>>>> It is not easy to reproduce but after taking some time to dig, it
>>>> seems
>>>> to appear with this commit:
>>>>
>>>> 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 is the first bad commit
>>>> commit 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
>>>> Author: John Stultz <john.stultz@xxxxxxxxxx>
>>>> Date: Fri Jul 13 01:21:53 2012 -0400
>>>>
>>>> time: Condense timekeeper.xtime into xtime_sec
>>>>
>>>> The timekeeper struct has a xtime_nsec, which keeps the
>>>> sub-nanosecond remainder. This ends up being somewhat
>>>> duplicative of the timekeeper.xtime.tv_nsec value, and we
>>>> have to do extra work to keep them apart, copying the full
>>>> nsec portion out and back in over and over.
>>>>
>>>> This patch simplifies some of the logic by taking the timekeeper
>>>> xtime value and splitting it into timekeeper.xtime_sec and
>>>> reuses the timekeeper.xtime_nsec for the sub-second portion
>>>> (stored in higher res shifted nanoseconds).
>>>>
>>>> This simplifies some of the accumulation logic. And will
>>>> allow for more accurate timekeeping once the vsyscall code
>>>> is updated to use the shifted nanosecond remainder.
>>>>
>>>> Signed-off-by: John Stultz <john.stultz@xxxxxxxxxx>
>>>> Reviewed-by: Ingo Molnar <mingo@xxxxxxxxxx>
>>>> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>>>> Cc: Richard Cochran <richardcochran@xxxxxxxxx>
>>>> Cc: Prarit Bhargava <prarit@xxxxxxxxxx>
>>>> Link:
>>>> http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@xxxxxxxxxx
>>>>
>>>>
>>>> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>>>
>>>> :040000 040000 4d6541ac1f6075d7adee1eef494b31a0cbda0934
>>>> dc5708bc738af695f092bf822809b13a1da104b6 M kernel
>>>>
>>>> How to reproduce: with a laptop T61p, with a Core 2 Duo. I boot the
>>>> kernel in busybox and wait some minutes before writing something in
>>>> the
>>>> console. At this moment, nothing appears to the console but the
>>>> characters are echo'ed several seconds later (could be 1, 5, or 10
>>>> secs
>>>> or more).
>>>>
>>>> That happens when CONFIG_CPU_IDLE and CONFIG_NO_HZ are set. Disabling
>>>> one of them, the issue does not appear.
>>> Thanks for bisecting this down and the heads up!
>>>
>>> Right off I can't see what might be causing this. Bunch of questions:
>>>
>>> Is this a 32 or 64 bit kernel?
>> It is a 32 bit kernel.
>
> Thanks for your answers! Has this has been seen on 3.6-rc4+ kernels?
> There were a few casting fixes that landed in 3.6-rc4 that would
> affect 32bit systems.

Ok, I have to check that. Unfortunately not before Wednesday.

>
> In the meantime, I'll try to reproduce on my T61. If you could send me
> your .config, I'd appreciate it.

http://pastebin.com/qSxqfdDK

The header of the config file shows for a v3.5-rc7 because it is the
result of the git-bisect. If you keep this config file for the latest
kernel that should reproduce the problem.

Let me know if you were able to reproduce the problem.

Thanks
-- Daniel

--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/