Re: Kernel Lockup - Was: Probably 2.4 kernel or AIC7xxx module trouble

From: Jim Gifford (maillist@jg555.com)
Date: Sat Jul 05 2003 - 11:31:08 EST


Any suggestions out there. The temporary fix is to run sync every hour. It
is now been up for 1 day and 4 hours, which is a record for me.

----- Original Message -----
From: "Jim Gifford" <maillist@jg555.com>
To: <linux-kernel@vger.kernel.org>
Sent: Friday, July 04, 2003 11:15 AM
Subject: Kernel Lockup - Was: Probably 2.4 kernel or AIC7xxx module trouble

> I wrote a small script last night to check my problem.
>
> #!/bin/sh
> rm /root/ps.dat
> while true; do
> # Date the file, or you could as well use the file's date field.
> date > /root/ps.dat
> echo "====" >> /root/ps.dat
> # Have a snipset of what's running on the machine.
> ps aux >> /root/ps.dat
> # To be sure we didn't crash while ps'ing
> echo "====" >> /root/ps.dat
> # To pevent from FS corruption
> sync
> # Have a little sleep... |-O
> sleep 1
> done
>
> The wierd part is that the system didn't lock up, last night like it
usually
> does.
>
> I also had vmstat -i run all night also, to help locate the problem.
>
> Here is the output of both files and the loadaverage
>
> At startup
> top - 13:38:10 up 5 min, 1 user, load average: 3.70, 1.78, 0.72
> Tasks: 106 total, 3 running, 103 sleeping, 0 stopped, 0 zombie
> Cpu0 : 30.8% user, 13.5% system, 55.8% nice, 0.0% idle
> Cpu1 : 39.7% user, 18.6% system, 41.7% nice, 0.0% idle
> Mem: 1033896k total, 195548k used, 838348k free, 7188k buffers
> Swap: 1060280k total, 0k used, 1060280k free, 45132k cached
>
> vmstat
>
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu--
> --
> r b swpd free buff cache si so bi bo in cs us sy
id
> wa
> 4 1 0 840600 5108 39192 0 0 94 29 61 185 44 9
47
> 0
> 4 2 0 835736 5196 39236 0 0 36 524 146 947 78 22
0
> 0
> 5 0 0 845516 5208 39224 0 0 8 272 195 876 87 13
0
> 0
> 3 0 0 845772 5212 39260 0 0 36 0 122 499 92 8
0
> 0
> 4 0 0 845536 5240 39268 0 0 0 236 133 345 91 9
0
> 0
> 6 0 0 840236 5328 39272 0 0 0 480 173 866 82 18
0
> 0
> 7 0 0 838084 5344 39924 0 0 668 0 226 833 84 16
0
> 0
> 3 0 0 838124 5376 39940 0 0 0 344 147 941 80 20
0
> 0
> 4 0 0 837800 5388 39940 0 0 4 64 113 272 93 7
0
> 0
> 5 0 0 838800 5412 39952 0 0 0 296 136 1035 82 18
0
> 0
>
> ps
>
> The ps data is the same as the other one.
> After 17 hours
>
> top - 11:11:51 up 17:13, 2 users, load average: 3.72, 3.71, 3.79
> Tasks: 114 total, 4 running, 110 sleeping, 0 stopped, 0 zombie
> Cpu0 : 28.7% user, 18.4% system, 52.9% nice, 0.0% idle
> Cpu1 : 33.1% user, 17.0% system, 49.8% nice, 0.0% idle
> Mem: 1033896k total, 453256k used, 580640k free, 109232k buffers
> Swap: 1060280k total, 0k used, 1060280k free, 158876k cached
>
> vmstat
>
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu--
> --
> r b swpd free buff cache si so bi bo in cs us sy
id
> wa
> 6 0 0 580424 109232 158892 0 0 0 112 111 678 74 26
0
> 0
> 4 0 0 580680 109232 158896 0 0 0 352 153 882 70 30
0
> 0
> 3 0 0 579696 109232 158884 0 0 0 0 104 752 70 30
0
> 0
> 4 0 0 579416 109232 158896 0 0 0 280 134 429 73 27
0
> 0
> 4 0 0 579868 109232 158892 0 0 0 0 111 903 62 38
0
> 0
> 5 0 0 580032 109232 158896 0 0 0 324 141 789 68 32
0
> 0
> 2 1 0 580264 109232 158896 0 0 0 280 112 509 81 19
0
> 0
> 3 0 0 579360 109232 158896 0 0 0 0 137 691 72 28
0
> 0
> 3 2 0 578780 109232 158896 0 0 0 208 130 927 65 35
0
> 0
> 2 0 0 579412 109232 158896 0 0 0 172 139 646 73 27
0
> 0
> 5 2 0 580436 109232 158896 0 0 0 408 156 558 73 27
0
> 0
> 3 0 0 580312 109232 158896 0 0 0 112 143 896 74 26
0
> 0
> 2 1 0 580976 109232 158896 0 0 0 304 138 1059 68 32
0
> 0
> 7 0 0 579768 109232 158896 0 0 0 0 116 129 88 13
0
> 0
> 4 1 0 580208 109232 158896 0 0 0 388 119 994 71 29
0
> 0
> 3 0 0 580100 109232 158896 0 0 0 0 132 878 75 25
0
> 0
> 2 0 0 579388 109232 158896 0 0 0 280 135 471 83 17
0
> 0
> 4 0 0 580316 109232 158884 0 0 0 0 105 686 80 20
0
> 0
> 4 0 0 580360 109232 158896 0 0 0 312 138 938 72 28
0
> 0
> 3 0 0 579348 109232 158888 0 0 0 140 120 670 77 23
0
> 0
> 6 0 0 580232 109232 158896 0 0 0 264 150 671 76 24
0
> 0
>
> ps
>
> Fri Jul 4 11:14:45 PDT 2003
> ====
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 1 0.0 0.0 1400 548 ? S Jul03 0:05 init
> root 2 0.0 0.0 0 0 ? SW Jul03 0:00 [keventd]
> root 3 0.0 0.0 0 0 ? SWN Jul03 0:00
> [ksoftirqd_CPU0]
> root 4 0.0 0.0 0 0 ? SWN Jul03 0:00
> [ksoftirqd_CPU1]
> root 5 0.0 0.0 0 0 ? SW Jul03 0:00 [kswapd]
> root 6 0.0 0.0 0 0 ? SW Jul03 0:00 [bdflush]
> root 7 0.0 0.0 0 0 ? SW Jul03 0:03 [kupdated]
> root 17 0.0 0.0 0 0 ? SW Jul03 0:00 [ahc_dv_0]
> root 18 0.0 0.0 0 0 ? SW Jul03 0:00 [ahc_dv_1]
> root 19 0.0 0.0 0 0 ? SW Jul03 0:00 [scsi_eh_1]
> root 20 0.0 0.0 0 0 ? SW Jul03 0:00 [scsi_eh_2]
> root 22 0.0 0.0 0 0 ? SW Jul03 0:00 [khubd]
> root 31 0.1 0.0 0 0 ? SW Jul03 2:00 [kjournald]
> root 50 0.0 0.0 1556 808 ? S Jul03 0:35 devfsd /dev
> root 73 0.0 0.0 0 0 ? SW Jul03 0:00 [kjournald]
> root 74 0.0 0.0 0 0 ? SW Jul03 0:43 [kjournald]
> root 75 0.0 0.0 0 0 ? SW Jul03 0:10 [kjournald]
> root 227 0.0 0.0 1600 596 ? S Jul03 0:00 gpm -m
> /dev/psaux -t ps2
> root 242 0.0 0.0 1888 964 ? S Jul03 0:35 syslog-ng
> root 271 0.0 0.1 2548 1584 ? S Jul03 0:13 dhcpd -cf
> /etc/dhcp/dhcpd.conf -q eth1
> root 1563 1.4 0.3 5516 3712 ? S Jul03 15:30
/usr/bin/perl
> /etc/zorbiptraffic/zorbcount.pl
> root 1570 0.0 0.1 2404 1192 ? S Jul03 0:00 /bin/sh
> /usr/bin/safe_mysqld --socket=/var/sockets/mysql.sock
> mysql 1588 0.0 1.3 50204 13904 ? S Jul03 0:00
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> mysql 1591 0.0 1.3 50204 13904 ? S Jul03 0:04
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> mysql 1592 0.0 1.3 50204 13904 ? S Jul03 0:00
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> mysql 1593 0.0 1.3 50204 13904 ? S Jul03 0:00
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> mysql 1594 0.0 1.3 50204 13904 ? S Jul03 0:00
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> mysql 1595 0.0 1.3 50204 13904 ? S Jul03 0:00
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> root 1596 0.0 0.6 12360 6968 ? S Jul03 0:00 clamd
> root 1597 0.0 0.6 12360 6968 ? S Jul03 0:04 clamd
> root 1598 64.5 0.6 12360 6968 ? R Jul03 667:30 clamd
> root 1600 0.0 0.6 12360 6968 ? S Jul03 0:07 clamd
> mysql 1605 0.0 1.3 50204 13904 ? S Jul03 0:00
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> mysql 1606 0.2 1.3 50204 13904 ? S Jul03 2:52
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> mysql 1607 0.0 1.3 50204 13904 ? S Jul03 0:00
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> mysql 1608 0.0 1.3 50204 13904 ? S Jul03 0:00
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> mysql 1609 0.3 1.3 50204 13904 ? S Jul03 3:09
> /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --us$
> spam 1698 0.0 2.0 23692 21528 ? S Jul03 0:08
/usr/bin/perl
> /usr/bin/spamd -d -c -a -q -x -u spam
> courier 1772 0.0 0.0 1372 436 ? S Jul03 0:00
> /usr/sbin/courierfilter start
> courier 1774 0.0 0.0 1368 344 ? S Jul03 0:00
courierlogger
> courierfilter
> root 1790 0.0 0.0 2840 760 ? S Jul03 0:00
> /usr/libexec/authlib/authdaemond.mysql start
> root 1792 0.0 0.1 2920 1172 ? S Jul03 0:01
> /usr/libexec/authlib/authdaemond.mysql start
> root 1793 0.0 0.1 2920 1172 ? S Jul03 0:01
> /usr/libexec/authlib/authdaemond.mysql start
> root 1794 0.0 0.1 2920 1172 ? S Jul03 0:01
> /usr/libexec/authlib/authdaemond.mysql start
> root 1795 0.0 0.1 2920 1172 ? S Jul03 0:02
> /usr/libexec/authlib/authdaemond.mysql start
> root 1798 0.0 0.1 2920 1172 ? S Jul03 0:01
> /usr/libexec/authlib/authdaemond.mysql start
> root 1815 0.0 0.0 3212 1024 ? S Jul03 0:00
> /usr/libexec/courier/courierd
> courier 1845 0.0 0.0 2488 888 ? S Jul03 0:00
> /usr/sbin/couriertcpd -stderrlogger=/usr/sbin/courierlogger -$
> courier 1858 0.0 0.0 1376 480 ? S Jul03 0:00
> /usr/sbin/courierlogger courieresmtpd
> courier 1872 0.0 0.0 2488 888 ? S Jul03 0:00
> /usr/sbin/couriertcpd -stderrlogger=/usr/sbin/courierlogger -$
> courier 1884 0.0 0.0 1376 480 ? S Jul03 0:00
> /usr/sbin/courierlogger esmtpd-ssl
> root 1899 0.0 0.0 2204 616 ? S Jul03 0:00
> /usr/sbin/couriertcpd -address=0 -stderrlogger=/usr/sbin/cour$
> root 1908 0.0 0.0 1368 344 ? S Jul03 0:00
> /usr/sbin/courierlogger pop3d
> root 1921 0.0 0.0 2204 616 ? S Jul03 0:00
> /usr/sbin/couriertcpd -address=0 -stderrlogger=/usr/sbin/cour$
> root 1931 0.0 0.0 1368 344 ? S Jul03 0:00
> /usr/sbin/courierlogger pop3d-ssl
> root 1951 0.0 0.0 2204 620 ? S Jul03 0:00
> /usr/sbin/couriertcpd -address=0 -stderrlogger=/usr/sbin/cour$
> root 1961 0.0 0.0 1376 480 ? S Jul03 0:00
> /usr/sbin/courierlogger imapd
> root 1979 0.0 0.0 2204 620 ? S Jul03 0:01
> /usr/sbin/couriertcpd -address=0 -stderrlogger=/usr/sbin/cour$
> root 1987 0.0 0.0 1376 480 ? S Jul03 0:02
> /usr/sbin/courierlogger imapd-ssl
> root 1996 0.0 0.0 1480 664 ? S Jul03 0:02 fcron
> root 2002 0.0 0.1 4812 1912 ? S Jul03 0:14 cupsd
> bin 2611 0.0 0.0 1484 456 ? S Jul03 0:00 portmap
> root 2623 0.0 0.0 1816 952 ? S Jul03 0:02 chronyd -f
> /etc/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jul 07 2003 - 22:00:24 EST