Re: 2.4.22-pre lockups (now decoded oops for pre10)

From: Andrea Arcangeli
Date: Mon Aug 18 2003 - 13:18:07 EST


On Wed, Aug 13, 2003 at 06:04:05PM +0200, Stephan von Krawczynski wrote:
> On Wed, 13 Aug 2003 19:30:09 +0400
> Oleg Drokin <green@xxxxxxxxxxx> wrote:
>
> > Hello!
> >
> > On Wed, Aug 13, 2003 at 05:12:24PM +0200, Stephan von Krawczynski wrote:
> >
> > > Well, that's exactly the reason why I am awaiting some more days of
> > > up-and-running ext3. After how many days will you be convinced that a
> > > random memory corruption should have hit the ext3 system that bad, that it
> > > should have crashed?
> >
> > Well, I'd prefer that you spend time to figure out at which exact
> > 2.4.21-pre version the crashes in reiserfs started to appear. ;)
>
> Well, Oleg, I'd love to, but there is an immanent problem with that. If
> I check pre-X and it crashes, everything is fine, because I have a certain
> result of the test. If it does not crash within 3 days, then I have a problem.
> How long do I wait before stating the pre is good? It could take months to test
> 10 pre's ... That cannot be the way to find out what is going on.
> On the other hand:
> - no UP kernel ever crashed. So we can at least talk about an SMP-race.
> - 2.4.20 does not crash
> - 2.4.21 does crash

an SMP kernel puts the double of the stress on the mem bus, so it might
still be ram that went bad around the time you upgraded from 2.4.19. Or
it maybe simply a buggy smp motherboard, or whatever.

Of course I can't be sure but we can't exclude it.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/