Re: 2.6.25: random stalls on certain hardware - regression?

From: Michael Tokarev
Date: Thu Jul 10 2008 - 03:11:48 EST


Oliver Neukum wrote:

[hard hangs in 2.6.25+ may be related to select()]

select() but not necessary the first call. I've got a report which seems
to indicate a rogue pointer while building the poll table. You might
strace a test programm and see whether it hangs in select() if it manages
to trigger the lockup.

I wrote a small program that opens/closes tcp and unix sockets at random,
does listen() on them and select()s them with zero timeout, with short
sleep()s in between - so that most of the time it will be in sleep(),
and only for a short time in select() and other syscalls. So if strace
will show it hanged in sleep() (most "active") we can't prove anything,
and if it will hand in select().... well... we can't prove anything
either, really... ;) I think. Today night it hanged again, now with
2.6.26-pre9 kernel, but I wasn't able to see the strace output unfortunately.
Will re-try again tonight.

The question here is if it all really worth the effort. The thing is that
we cant prove anything either way, because if it were hanged in sleep() it
may be due to some OTHER program were in "bad" select() at that time, or
if strace showed it hanged in select(), it may be that some other part of
the system hanged at that time (but this is less likely still, as select()
timeframe is very small compared with sleep() timeframe).

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/