Kernel 2.67.17.1 is hanging I/O under AMD64 [Was Re: Kernel2.6.17-rc5 under amd64 may be hanging I/O]

From: Maurice Volaski
Date: Fri Jun 30 2006 - 01:44:23 EST


The problem,at least I think it's the same problem, also occurs in kernel 2.6.17.1.

The relevant output has been posted online at
http://www.kennedy.aecom.yu.edu/misc/drbdstall2.htm


/ 2006-06-08 20:34:10 -0400
\ Maurice Volaski:
It appears that kernel 2.6.17-rc5 under amd64 may be hanging I/O
processes spontaneously at random.

Our setup here uses drbd (ver. 0.7.19), which is a network RAID kernel
module, and initially I was fscking (ext3) filesystems, which are on
drbd devices, and the fsck just stopped spontaneously on the two of
> them.

Today, I tried copying a directory on the command line with cp -a on
this computer (on a drbd-managed device) and then in mid-copy I tried
to abort the process with control-C. It did not abort. I tried killing
it with kill and then with kill -9. It turns out that the process had
died, but is still left in ps.

Just by happenstance,

Anyway, I tried bringing down the peer and bringing it back up and it
stalled.

> I also had an emerge sync (i.e., the Gentoo update mechanism) going
and it too got stuck, but it doesn't, or at least shouldn't affect
drbd disks, implying this is not a drbd bug, but a kernel bug.


well. it says below that emerge hangs is drbd_al_begin_io, so it is at
least drbd related, too.

>* what are the numbers in /proc/drbd

This is how it appears long after the copy hang and I stopped and just
restarted drbd on the peer. The output from then hanging computer:

thanks, that might help in debugging things.
I'll have a look, maybe I can find something in there.


--

Maurice Volaski, mvolaski@xxxxxxxxxxxx
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/