Re: Strange system hangs

From: Krzysztof Oledzki
Date: Sat Sep 29 2007 - 11:54:54 EST

On Sat, 29 Sep 2007, Nick Piggin wrote:

On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote:

I am experiencing weird system hangs. Once about 2-5 weeks system freezes
and stops accepting remote connections, so it is no longer possible to
connect to most important services: smtp (postfix), www (squid) or even
ssh. Such connection is accepted but then it hangs.

What is strange, that previously established ssh session is usable. It is
possible to work on such system until you do something stupid like "less
/var/log/all.log". Using strace I found that process blocks on:

Is this a regression? If so, what's the most recent kernel that didn't show
the problem?

I don't know. First kernel I ran was 2.6.20.x. This is quite fresh system.

The symptoms could be consistent with some place doing a
balance_dirty_pages while holding a lock that is required for IO, but I can't
see a smoking gun (you've got contention on i_mutex, but that should be

Can you see if there is any memory under writeback that isn't being
completed (sysrq+M), also a list the locks held after the hang might be
helpful (compile in lockdep and sysrq+D)

OK. I'll try to do it next time if there will be a chance. It may take some time, BTW.

Is anything currently running? (sysrq+P and even a full sysrq+T task list
could be useful).

I'll have to check - maybe I have this captured. If not I'll check it next time.

Are any IO errors occurring at all?

Didn't notice - so no.

Thank you.

Best regards,

Krzysztof Olędzki