Processes hanging in 'D' state after D64S breakdown

Kurt Huwig (kurt@huwig.de)
Fri, 16 Apr 1999 20:13:27 +0200


Hello!

The slave device of my D64S ISDN leased line went down with this log
entries:

Apr 15 10:07:45 smuf kernel: isdn: ELSA,ch0 cause: L002F
Apr 15 10:07:45 smuf kernel: isdn0: remote hangup (0)
Apr 15 10:07:45 smuf kernel: isdn0: Chargesum is 0
Apr 15 10:07:45 smuf kernel: idx=4 drv=2 ch=0
Apr 15 10:07:45 smuf kernel: isdn: ELSA,ch1 cause: L002F
Apr 15 10:07:45 smuf kernel: isdn0s: remote hangup (0)
Apr 15 10:07:45 smuf kernel: isdn0s: Chargesum is 0
Apr 15 10:07:45 smuf kernel: idx=5 drv=2 ch=1
Apr 15 10:07:45 smuf kernel: isdn0: call from LEASED2 -> 1 accepted
Apr 15 10:07:45 smuf kernel: ICALLslv: isdn0s
Apr 15 10:07:45 smuf kernel: master=isdn0
Apr 15 10:07:45 smuf kernel: master offline
Apr 15 10:07:45 smuf kernel: mlpf: 1
Apr 15 10:07:45 smuf kernel: isdn_tty: call from LEASED2 -> 2 ignored
Apr 15 10:07:46 smuf kernel: isdn_net: isdn0 connected
Apr 15 10:07:46 smuf kernel: isdn_net: chargetime of isdn0 now 252439919

At 13:18 I did realize this and restarted the ISDN link (HiSax unloaded)
and everything seemed fine again.

I run the MRTG which connects to my providers Cisco every 5 minutes. In
the evening, I found some hanging processes and stopped the cron
process.

root 12330 0.0 0.3 1116 240 ? D Apr 15 0:00
/usr/bin//rateup /usr
root 12787 0.0 0.3 1116 248 ? D Apr 15 0:00
/usr/bin//rateup /usr
root 13138 0.0 0.3 1116 228 ? D Apr 15 0:00
/usr/bin//rateup /usr
root 13468 0.0 0.3 1116 232 ? D Apr 15 0:00
/usr/bin//rateup /usr
root 13975 0.0 0.3 1116 248 ? D Apr 15 0:00
/usr/bin//rateup /usr
root 14490 0.0 0.4 1116 280 ? D Apr 15 0:00
/usr/bin//rateup /usr
root 15130 0.0 0.3 1116 248 ? D Apr 15 0:00
/usr/bin//rateup /usr
root 15635 0.0 0.3 1116 244 ? D Apr 15 0:00
/usr/bin//rateup /usr
root 15933 0.0 0.4 1116 260 ? D Apr 15 0:00
/usr/bin//rateup /usr
root 16300 0.0 0.4 1116 280 ? D 20:10 0:00
/usr/bin//rateup /usr
root 16573 0.0 0.4 1116 260 ? D 20:40 0:00
/usr/bin//rateup /usr

The interesting thing is, that this program should run every 5 minutes,
but not all of them hanged. All of them look like this

smuf:/var/data # ps lnwax | grep rate
100000 0 12330 12294 0 0 1116 240 1121b5 D ffff 0:00
/usr/bin//rateup /usr/local/httpd/htdocs/huwig.de/www/mrtg/
xxx.xxx.de.14 924181

i.e. hanging at 0x1121b5; from System.map:

00112118 T __up
00112130 T __do_down
001121f0 T __down

'lsof' gives:

smuf:/var/data # lsof | grep 16573
rateup 16573 root cwd DIR 8,6 5120 192036 /var
(/dev/sda6)
rateup 16573 root rtd DIR 8,5 1024 2 /
rateup 16573 root mem REG 8,8 72093 27986 /usr
(/dev/sda8)
rateup 16573 root mem REG 8,5 149372 20458
/lib/ld-2.0.7.so
rateup 16573 root mem REG 8,5 384749 20465
/lib/libm.so.6
rateup 16573 root mem REG 8,5 2478585 20461
/lib/libc.so.6
rateup 16573 root 0r FIFO 0x0092d000 0 PIPE <-
rateup 16573 root 1w FIFO 0x02394000 0 PIPE ->
rateup 16573 root 2w FIFO 0x008dc000 0 PIPE ->
rateup 16573 root 3r CHR 1,3 0t0 12666
/dev/null

'netstat -a' shows no connections to the cisco, but 8 connections to
Squid are hanging in "CLOSE" state for several weeks.

Any clues? Please reply via eMail, as I am not subscribed.

Kurt

System-data:

Linux version 2.0.36 (root@smuf) (gcc version 2.7.2.3)
SuSE 6.0 glibc
ELSA Quickstep 1000PCI ISDN
2x TeleINT HFC ISDN
ISDN subsystem Rev: 1.44.2.9/1.41.2.11/1.48.2.27/1.28.2.2/1.8.2.2
HiSax: Version 3.1 (module)
HiSax: Layer1 Revision 1.15.2.19
HiSax: Layer2 Revision 1.10.2.11
HiSax: TeiMgr Revision 1.8.2.7
HiSax: Layer3 Revision 1.10.2.6
HiSax: LinkLayer Revision 1.30.2.13

---------------------------------------------------------------
Win-Installation: How often do YOU want to boot today?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/