Re: 2.4.23aa1 ext3 oops

From: Jamie Clark
Date: Thu Dec 18 2003 - 01:21:21 EST


David Woodhouse wrote:

On Thu, 2003-12-11 at 10:33 +0800, Jamie Clark wrote:


After a quick browse of the assembler output the zeroing would appear to be part of the list_del inline, and edi seems to equate to &sb.

Seems reasonable. It does look like something's stomped on sb->s_dirty.


__mark_inode_dirty() does not appear to take sb_lock before adding to the s_dirty list. Could that be the culprit?



I don't think so; it's holding the inode_lock which should be
sufficient. Besides -- in practice all updates to the 4-byte pointer
sb->s_dirty.next are going to be atomic, and there's no reason _ever_
for it to be set to d7ffbc08. It's hard to see how a simple locking
problem is going to cause such a thing.


True. I confess I didn't think too hard after narrowing down where it tripped up.

How repeatable is this?

The first oops ocurred after 4 or 5 days. My second run crashed in the first night, this time in filemap.c: precheck_file_write().
This oops seemed to be at or near the first dereference of inode, before the f_flags test.

/* FIXME: this is for backwards compatibility with 2.4 */
if (!S_ISBLK(inode->i_mode) && (file->f_flags & O_APPEND))
*ppos = pos = inode->i_size;

>>EIP; c01306fb <precheck_file_write+53/1f8> <=====
Trace; c01308f8 <generic_file_write_nolock+58/4c8>
Trace; c013108f <generic_file_write+13f/158>
Trace; c016f3a7 <ext3_file_write+23/bc>
Trace; c0140e57 <sys_write+8f/100>
Trace; c0107133 <system_call+33/38>

Can you turn on slab poisoning?


Is CONFIG_DEBUG_SLAB all that I need?

I'm currently running 2.4.23aa1 with the inode.c patch reversions as Marcelo first suggested. When (IF) that crashes, or if I can get the test running on another SMP box I will revert to 2.4.23 + the qla2300 driver that I need for the fibre-channel array.

-Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/