Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

From: Gabriel C
Date: Tue Mar 18 2008 - 00:25:03 EST


Gabriel C wrote:
> Gabriel C wrote:
>> Thomas Gleixner wrote:
>>> On Mon, 17 Mar 2008, Gabriel C wrote:
>>>>> Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
>>>>> Submitter : Gabriel C <nix.or.die@xxxxxxxxxxxxxx>
>>>>> Date : 2008-02-24 01:31 (22 days old)
>>>>> References : http://lkml.org/lkml/2008/2/23/380
>>>>> http://lkml.org/lkml/2008/2/24/281
>>>>> Handled-By : Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>>>>
>>>> Thomas do you want me to bisect ?
>>> That'd be great.
>> Ok I'll start doing that later on today.
>>
>
> I managed to bisect 'one of the bugs' down , I got some problems and used skip once because a revision didn't compiled ,
> but it seems bisect got the right commit still. Sadly it seems there are 2 different bugs.
>
> Also before I've started the bisect I've tested linux-next to be sure the bug(s) still exists and while rc1 got that already
> I've started to bisect 2.6.24 -> 2.6.25-rc1.
>
> cat .git/refs/bisect/bad
> 1ada5cba6a0318f90e45b38557e7b5206a9cba38
>
> git show 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> Author: Andi Kleen <ak@xxxxxxx>
> Date: Wed Jan 30 13:30:02 2008 +0100
>
> clocksource: make clocksource watchdog cycle through online CPUs
>
> This way it checks if the clocks are synchronized between CPUs too.
> This might be able to detect slowly drifting TSCs which only
> go wrong over longer time.
>
> Signed-off-by: Andi Kleen <ak@xxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index cabfa19..edd5ef8 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -142,8 +142,13 @@ static void clocksource_watchdog(unsigned long data)
> }
>
> if (!list_empty(&watchdog_list)) {
> - __mod_timer(&watchdog_timer,
> - watchdog_timer.expires + WATCHDOG_INTERVAL);
> + /* Cycle through CPUs to check if the CPUs stay synchronized to
> + * each other. */
> + int next_cpu = next_cpu(raw_smp_processor_id(), cpu_online_map);
> + if (next_cpu >= NR_CPUS)
> + next_cpu = first_cpu(cpu_online_map);
> + watchdog_timer.expires += WATCHDOG_INTERVAL;
> + add_timer_on(&watchdog_timer, next_cpu);
> }
> spin_unlock(&watchdog_lock);
> }
> @@ -165,7 +170,7 @@ static void clocksource_check_watchdog(struct clocksource *cs)
> if (!started && watchdog) {
> watchdog_last = watchdog->read();
> watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
> - add_timer(&watchdog_timer);
> + add_timer_on(&watchdog_timer, first_cpu(cpu_online_map));
> }
> } else {
> if (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS)
> @@ -186,7 +191,8 @@ static void clocksource_check_watchdog(struct clocksource *cs)
> watchdog_last = watchdog->read();
> watchdog_timer.expires =
> jiffies + WATCHDOG_INTERVAL;
> - add_timer(&watchdog_timer);
> + add_timer_on(&watchdog_timer,
> + first_cpu(cpu_online_map));
> }
> }
> }
>
>
> git bisect log
> git-bisect start
> # bad: [19af35546de68c872dcb687613e0902a602cb20e] Linux 2.6.25-rc1
> git-bisect bad 19af35546de68c872dcb687613e0902a602cb20e
> # good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
> git-bisect good 49914084e797530d9baaf51df9eda77babc98fa8
> # bad: [d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f] x86: add PAGE_KERNEL_EXEC_NOCACHE
> git-bisect bad d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f
> # good: [fb46990dba94866462e90623e183d02ec591cf8f] [NETFILTER]: nf_queue: remove unnecessary hook existance check
> git-bisect good fb46990dba94866462e90623e183d02ec591cf8f
> # good: [936722922f6d2366378de606a40c14f96915474d] [IPV4] fib_trie: compute size when needed
> git-bisect good 936722922f6d2366378de606a40c14f96915474d
> # bad: [ff14c6164bd532a6dc9025c07d3b562f839f00a9] x86: x86-64 ia32 ptrace pt_regs cleanup
> git-bisect bad ff14c6164bd532a6dc9025c07d3b562f839f00a9
> # good: [c087567d3ffb2c7c61e091982e6ca45478394f1a] SUNRPC: Remove the obsolete RPC_WAITQ macro
> git-bisect good c087567d3ffb2c7c61e091982e6ca45478394f1a
> # bad: [af7a78e9258ffcca681e080cbc857f854869144f] x86: move mce related declarations
> git-bisect bad af7a78e9258ffcca681e080cbc857f854869144f
> # good: [34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a] SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
> git-bisect good 34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a
> # bad: [83bd01024b1fdfc41d9b758e5669e80fca72df66] x86: protect against sigaltstack wraparound
> git-bisect bad 83bd01024b1fdfc41d9b758e5669e80fca72df66
> # good: [efd9ac8630e89b9ee7ce64008bd7783952374f37] time: fold __get_realtime_clock_ts() into getnstimeofday()
> git-bisect good efd9ac8630e89b9ee7ce64008bd7783952374f37
> # bad: [37a47db8d7f0f38dac5acf5a13abbc8f401707fa] x86: assign IRQs to HPET timers, fix
> git-bisect bad 37a47db8d7f0f38dac5acf5a13abbc8f401707fa
> # skip: [316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba] x86: restrict PIT clocksource usage
> git-bisect skip 316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba
> # bad: [4713e22ce81eb8b3353e16435362eb3d0ec95640] clocksource: add unregister function to disable unusable clocksources
> git-bisect bad 4713e22ce81eb8b3353e16435362eb3d0ec95640
> # bad: [1ada5cba6a0318f90e45b38557e7b5206a9cba38] clocksource: make clocksource watchdog cycle through online CPUs
> git-bisect bad 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> # good: [1077f5a917b7c630231037826b344b2f7f5b903f] clocksource.c: use init_timer_deferrable for clocksource_watchdog
> git-bisect good 1077f5a917b7c630231037826b344b2f7f5b903f
>
>
> Also the broken revision died with that :
>
> arch/x86/kernel/i8253.c: In function 'init_pit_clocksource':
> arch/x86/kernel/i8253.c:207: error: implicit declaration of function 'is_hpet_enabled'
> make[1]: *** [arch/x86/kernel/i8253.o] Error 1
> make: *** [arch/x86/kernel] Error 2
>
> If you tell me on how to fix that I'll restart the bisect from there , just in case ..
>
>
> Also reverting the commit from 2.6.25-rc1 fixes the 'Tsc being unstable thing' but it does not fix the hang
> when I boot with clocksource=acpi_pm so that seems to be introduced in a different commit.
>
> I will try to bisect this hang also , most probably on weekend.
>
>
> Also I reverted that commit from git head and an kernel compiles right now, I'll let you know in a bit if that worked out.

Worked out :)

git head - 1ada5cba6a0318f90e45b38557e7b5206a9cba38 works here.

dmesg|grep clocksource
[ 0.563915] Time: tsc clocksource has been installed.

uname -a
Linux lara 2.6.25-rc6-00014-gbde4f8f-dirty #2 SMP PREEMPT Tue Mar 18 04:48:53 CET 2008 i686 GNU/Linux


Gabriel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/