Re: Reiserfs deadlock in 2.6.36

From: Bastien ROUCARIES
Date: Tue Mar 08 2011 - 03:41:22 EST


On Mon, Mar 7, 2011 at 8:00 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> Hi Bastien,

Cc: Ingo Molnar because he work a lot on soft lockup, and could have
an idea to debug
cc: andrew morton that trakc also "File/memory corruption in 2.6.37"

>> I take me more than two days of testing to reporduce this bugs with trace enabled. My filesystem was quite slow and this bugs seems
>> to be timing related.
>>
>> One patern that trigger this bug is git. Doing a lot of git work of my desktop crash my machine.
>>
>> Moreover, trying to reproduce this bug lead to data loss. I have rebuilded twice my / partition using --rebuild-tree, and restored
>> my home partition three times using backups.
>>
>> My log is here.
>>
>> Do you need more information?
>
> Yeah do you have CONFIG_REISERFS_CHECK? I just would
> like to ensure we are not missing this important source of
> information.

Yes I have it
> I'm puzzled because, given the traces, your opening and closing of the journal are
> well balanced.
>
> You have a writer queued and stuck but I see no trace of it in the traces stream.
> I only see well balanced journal operations, including journal closing that would have
> woken your queued writer.
>
> A theory could be that your queued writer was waiting for someone to close the journal,
> which finally happen but actually several minutes later, after there was many
> journal opening/closing that overwrote the old trace containing the queueing of
> the stuck writer.

Doing a while true;do sync && sleep1; done; help a lot

>
> I don't know what to do yet. I need to think more about it.
>

Could we do the stuff I have sugested at first ? use lockdep to track
journal open,/close using fake lock ?

BTW it seems that someone experiment this confition on ext3. I could
do more testing if you want, and I will run xfstests in order to see
if I could reproduce more quickly

Bastien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/