Re: inode_unused list corruption in 2.4.26 - spin_lock problem?

From: Chris Caputo
Date: Fri Jul 02 2004 - 15:01:15 EST


On Fri, 25 Jun 2004, Marcelo Tosatti wrote:
> On Wed, Jun 23, 2004 at 06:50:48PM -0700, Chris Caputo wrote:
> > Is it safe to assume that the x86 version of atomic_dec_and_lock(), which
> > iput() uses, is well trusted? I figure it's got to be, but doesn't hurt
> > to ask.
>
> Pretty sure it is, used all over. You can try to use non-optimize version
> at lib/dec_and_lock.c for a test.

Hi Marcelo. Just an update...

I normally run irqbalance on these systems which are experiencing the
inode_unused list corruption problem.

I disabled irqbalance on two servers which have experienced the
inode_unused list corruption problem and they have now been running
without crashing for considerably longer than they had been before. The
length of run time without problem is not yet conclusive, but I wanted to
give you a heads-up on my progress.

My current theory is that occasionally when irqbalance changes CPU
affinities that the resulting set_ioapic_affinity() calls somehow cause
either inter-CPU locking or cache coherency or ??? to fail.

I CC'ed Arjan because this involves irqbalance, although I'd expect the
fix will be needed in set_ioapic_affinity() or related kernel code.

Any insights welcome. I am not sure what to do next except wait and see
if these servers with irqbalance disabled continue to stay up.

Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/