Re: Regression in linux 2.6.37: failure on remount / (ext4) rw

From: Pekka Enberg
Date: Tue Jan 18 2011 - 08:57:22 EST


Hi,

[ Hey, why was I dropped from the CC? ]

On Mon, Jan 17, 2011 at 2:32 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
> On Fri, Jan 14, 2011 at 9:04 AM, Matthias Merz <linux@xxxxxxxxxx> wrote:
>> Hello,
>>
>> Am Mi, 12.01.2011 09:03 schrieb Pekka Enberg
>>> On Tue, Jan 11, 2011 at 3:09 PM, Matthias Merz <linux@xxxxxxxxxx> wrote:
>>> > Am Di, 11.01.2011 09:50 schrieb Pekka Enberg
>>> >> On Tue, Jan 11, 2011 at 12:31 AM, Matthias Merz <linux@xxxxxxxxxx> wrote:
>>> >> > This morning I tried vanilla 2.6.37 on my Desktop system, which
>>> >> > failed to boot but continued displaying Debug-Messages too fast
>>> >> > to read. Using netconsole I was then able to capture them [see
>>> >> > below]. I was able to trigger this bug even with init=/bin/bash
>>> >> > by a simple call of "mount -o remount,rw /" with my / being an
>>> >> > ext4 filesystem.
>>> > [erroneous bisecting] I assume some "hardware state" influeces
>>> > triggering of this bug
>>
>>> Would it be possible for you to try to bisect it again? The oops you
>>> report looks slightly obscure (at least to me) so it might be
>>> difficult to find someone to fix it.
>>
>> Calling back after some time. Now I seem to have worked out a way to
>> tell which versions are bad: After having booted a "good" version, a
>> Power-down for a period of several minutes is needed (about 15 or so) or
>> every version will be "good". So I checked by first booting a "known
>> bad" 2.6.37. If that boot failed, I booted the version I wished to
>> check, which seems to have produced usable results. So I was/am pretty
>> convinced that something during "hardware setup" has changed which will
>> survive a normal reset due to capacitances not fully discharged or
>> something like that.
>>
>>
>> git bisect now told me "22d4cd4c4dce6d7b7d9a7e396aa4f87fe7a649b1 is the
>> first bad commit", which is titled: "x86-32: Allocate irq stacks
>> seperate from percpu area".
>>
>> I reverted this change (and following 47f19a0814 due to #defines) and
>> waited over the night until this morning. That revert really seems to
>> fix my problem. So maybe in my special case something goes wrong with
>> the new method?
>
> Does this patch fix the problem?
>
> Subject: [PATCH] x86: Clear irqstack thread_info
>
> Make sure that the thread_info part of the irqstack is initialized
> to zeroes.
>
> Signed-off-by: Brian Gerst <brgerst@xxxxxxxxx>

Acked-by: Pekka Enberg <penberg@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/