Re: [Bug #11035] System hangs on 2.6.26-rc8

From: Roman Mindalev
Date: Tue Jul 15 2008 - 09:42:38 EST


Roman Mindalev wrote:
> Rafael J. Wysocki wrote:
>> This message has been generated automatically as a part of a report
>> of recent regressions.
>>
>> The following bug entry is on the current list of known regressions
>> from 2.6.25. Please verify if it still should be listed.
>>
>>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11035
>> Subject : System hangs on 2.6.26-rc8
>> Submitter : Roman Mindalev <lists@xxxxxxxxx>
>> Date : 2008-07-02 14:25 (5 days old)
>> References : http://marc.info/?l=linux-kernel&m=121500871414995&w=4
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> Announce: It is long history, if you don't want to read it, go directly
> to assumption ;)
>
> Short description of problem: SIGSEGV on high I/O load (reading packages
> database, kernel untaring) without any records in logs
>
> Prehistory: 2.6.26-rc7 works, 2.6.26-rc8 buggy, configs very simular.
>
> History: In last days I compiled some 2.6.25 kernels (took it, because
> it is stable) with different configs (step-by-step disabling options,
> not equals in -rc7 and -rc8) and got next results:
>
> 2.6.25 (preemtible, preempt rcu, 300 Hz, snd_sequencer, snd_seq_dummy,
> snd_rtctimer, snd_seq_rtctimer_default, debug_preempt, rcu_tortune_test)
> - bug
> 2.6.25 (preemtible, preempt rcu, 300 Hz, snd_sequencer, snd_seq_dummy,
> debug_preempt, rcu_tortune_test) - bug
> 2.6.25 (preemtible, preempt rcu, 300 Hz, snd_sequencer, debug_preempt,
> rcu_tortune_test) - bug
> 2.6.25 (preemtible, preempt rcu, 300 Hz, debug_preempt,
> rcu_tortune_test) - bug
> 2.6.25 (preemtible, preempt rcu, 250 Hz, debug_preempt,
> rcu_tortune_test) - bug
> 2.6.25 (preemtible, preempt rcu, 250 Hz, snd_rtctimer, debug_preempt,
> rcu_tortune_test) - bug
> 2.6.25 (preemtible, preempt rcu, 100 Hz, debug_preempt,
> rcu_tortune_test) - bug
> 2.6.25 (preemtible, 250 Hz, debug_preempt, rcu_tortune_test) - bug
> 2.6.25 (250 Hz, rcu_tortune_test) - bug
> 2.6.25 (250 Hz) - bug
>
> And I understand - problem not (only?) in kernel, problem in GCC too (I
> updated whole system in June).
>
> 2.6.24 - one kernel (I'm tested from it to latest rc), which (with
> time.patch) works with GCC 4.3.1
> 2.6.25 and above (tested with 2.6.26-rc7, 2.6.26-rc8, 2.6.26-rc9) don't
> works, if compiled with this compiler version.
>
> Then I look on my (working) kernels - 2.6.26-rc6 was compiled with GCC
> 4.2.4, and 2.6.26-rc7 too...
>
> In testing purposes I took some listed kernels and recompiled them with
> other GCC version.
>
> Common results in table:
> GCC 4.3.1, kernel 2.6.24 - works
> GCC 4.2.4, kernel 2.6.25 - works
> GCC 4.3.1, kernel 2.6.25 - bug
> GCC 4.2.4, kernel 2.6.26-rc7 - works
> GCC 4.3.1, kernel 2.6.26-rc7 - bug
> GCC 4.2.4, kernel 2.6.26-rc8 - works
> GCC 4.3.1, kernel 2.6.26-rc8 - bug
>
> Assumption: new features (or new bugs:)) in GCC 4.3 conflicts with some
> commit(s), included in kernel between 2.6.24 and 2.6.25
> --
> To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

I done bisection.
Result below:

8f46924600e30b140445f5b84abe9b80d2fff5fb is first bad commit
commit 8f46924600e30b140445f5b84abe9b80d2fff5fb
Author: Ingo Molnar <mingo@xxxxxxx>
Date: Wed Jan 30 13:34:09 2008 +0100

x86: enable CONFIG_DEBUG_PAGEALLOC more widely

make CONFIG_DEBUG_PAGEALLOC universally available.

CONFIG_HIBERNATION and CONFIG_HUGETLBFS was disabling it, for no
particular reason.

If there are any unfixed bugs here we'll fix it, but do not disable
vital debugging facilities like that ..

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

:040000 040000 ea82c42d0972aabc1f34978dc9b9c73edbd7e508
446e8dd9bb2fcb1698d038b09800dc8aa8c335ab M arch

Commit body:

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 2a859a7..347e33e 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -40,7 +40,7 @@ comment "Page alloc debug is incompatible with
Software Suspend on i386"

config DEBUG_PAGEALLOC
bool "Debug page memory allocations"
- depends on DEBUG_KERNEL && !HIBERNATION && !HUGETLBFS
+ depends on DEBUG_KERNEL
help
Unmap pages from the kernel linear mapping after free_pages().
This results in a large slowdown, but helps to find certain types

I applied it (reversed) to 2.6.25 source and compiled new kernel.
Hibernation enabled, hugetlbfs too. And difference between configs:

diff config-2.6.25-old config-2.6.25-new
4c4
< # Sat Jul 12 17:24:17 2008
---
> # Tue Jul 15 16:24:10 2008
1927d1926
< CONFIG_DEBUG_PAGEALLOC=y

I have no problems with this (new) config.

Seems conflict between new features in GCC 4.3.1 and pagealloc debug?

Attachment: config-2.6.25-new.tar.gz
Description: GNU Zip compressed data

Attachment: config-2.6.25-old.tar.gz
Description: GNU Zip compressed data