ISSUE: During backup from nfs mounted drives, network connection is lost

Havard Bell (HBell@Glacialis.no)
Wed, 13 Jan 1999 13:28:34 +0900


1. During backup from nfs mounted drives, network connection is lost

2.
We have a rather complicated local network. Complicated in the sense
that there are many different machine and OS's
connected. Unfortunately, I don't have the full grasp of what is going
on with our network, so the following may not be very helpful.

>From my linux-2.2.0-pre5 machine, I try to backup all our unix home
directories, using a local connected scsi HP SuperStoreTape DAT
drive. I do this by mounting the directories I need, read-only, to
agnesi (my machine), and run tar through a very simple script. tar
then backups the chosen directories (some nfs mounted, some local)
onto tape.

Using linux-2.0.36 there has never been a problem.

I tried it several times (both with 2.2.0-pre6 and 2.2.0-pre5), and
the connection to the local network (and thus Internet) seems to just
die. The following relevant messages comes in /var/log/messages:

[..cut..]
Jan 13 12:34:26 agnesi kernel: nfs: server lucky not responding, still trying
Jan 13 12:34:26 agnesi kernel: nfs: server lucky not responding, still trying
Jan 13 12:34:27 agnesi last message repeated 10 times
Jan 13 12:34:29 agnesi kernel: nfs: task 13929 can't get a request slot
Jan 13 12:34:27 agnesi last message repeated 10 times
Jan 13 12:34:29 agnesi kernel: nfs: task 13929 can't get a request slot
Jan 13 12:34:36 agnesi ypbind[351]: euler.our.domain.name: RPC: Timed out
[..cut..]
Jan 13 12:43:36 agnesi ypbind[351]: Host name lookup failure
Jan 13 12:46:46 agnesi ypbind[351]: Host name lookup failure
Jan 13 12:49:42 agnesi kernel: Socket destroy delayed (r=0 w=1664)
Jan 13 12:49:56 agnesi ypbind[351]: Host name lookup failure
Jan 13 12:53:06 agnesi ypbind[351]: Host name lookup failure
Jan 13 12:56:16 agnesi ypbind[351]: Host name lookup failure
[..cut..]
do_ypcall: clnt_call: RPC: Timed out
do_ypcall: clnt_call: RPC: Timed out
Jan 13 13:02:36 agnesi ypbind[351]: Host name lookup failure
do_ypcall: clnt_call: RPC: Timed out
do_ypcall: clnt_call: RPC: Timed out
do_ypcall: clnt_call: RPC: Timed out
do_ypcall: clnt_call: RPC: Timed out
do_ypcall: clnt_call: RPC: Timed out
do_ypcall: clnt_call: RPC: Timed out
[..cut..]

I have tried to restart the nfs system (rpc.nfsd, rpc.mountd), ypbind
and protmap, but nothing helps.

ypwhich reports: can't yp_bind: Reason: Domain not bound

For not knowing any better, I end up rebooting, and everything is back
to normal. If I don't do backup, everything runs fine for days.

The error messages occured when backing up from a nfs mounted
directory, whose host is a linux-2.1.105 SMP (dual pentium) system,
and when backing up from a NetBSD sysem. Backing up from SunOS seemed
to be no problem, also from FreeBSD system. Of course the amount of
files varies greatly, so this may not mean much.

NOTE: The "Domain not bound" message has occured on our network
before. Agnesi has had problems before, which seemed to come randomly,
of similar nature. However, at that time, only one machine on the
local network seemd unreachable, the rest was no problem. We used a
2.0.36 kernel. This time, this problem is reproduced every time I
peform the above mentioned backup.

(My apologies for this confusing report, but I am rather at loss by
this whole problem, so I am afraid this is the best I can do.)

3.
nfs, networking, backup, i/o

4.
Linux version 2.2.0-pre5 (root@agnesi.nasl.t.u-tokyo.ac.jp) (gcc version egcs-2.
91.60 19981201 (egcs-1.1.1 release)) #2 Fri Jan 8 21:29:54 JST 1999

5.
N/A

6.
---shell script we use for backup----
#!/bin/sh
cd /
/bin/mount /.remote/biot/usr/home
/bin/mount /.remote/bernoulli/usr/home/taku
/bin/mount /.remote/newton/koya2
/bin/mount /.remote/euler/casa3
/bin/mount /.remote/archimedes/var
/bin/mount /.remote/archimedes/haus
/bin/mount /.remote/lucky/usr
/bin/tar --same-owner --exclude-from /root/backup/Exclude_Files -cpvf /dev/st0 `cat /root/backup/Include_Files` > /root/backup/Reports/Report
/bin/date > /root/backup/Full_Date
/bin/umount /.remote/biot/usr/home
/bin/umount /.remote/bernoulli/usr/home/taku
/bin/umount /.remote/newton/koya2
/bin/umount /.remote/euler/casa3
/bin/umount /.remote/archimedes/var
/bin/umount /.remote/archimedes/haus
/bin/umount /.remote/lucky/usr
------- end shell script---------

-------part of fstab-----------
# machine:<path> Agnesi's mount point type flags
#
biot:/usr/home /.remote/biot/usr/home nfs noauto,ro
newton:/koya2 /.remote/newton/koya2 nfs noauto,ro
archimedes:/var /.remote/archimedes/var nfs noauto,ro
archimedes:/haus /.remote/archimedes/haus nfs noauto,ro
lucky:/usr /.remote/lucky/usr nfs noauto,ro
euler:/casa3 /.remote/euler/casa3 nfs noauto,ro
island:/yakata /.remote/island/yakata nfs noauto,ro
bernoulli:/usr/home/taku /.remote/bernoulli/usr/home/taku nfs noauto,ro
schrodinger:/usr/home/naka /.remote/schrodinger/usr/home/naka nfs noauto,ro
------end fstab---------------

7.
Agnesi is a Pentium II 400, with 128 MB RAM and a 6G HD (IDE) (from
Gateway 2000). Lot's of applications running under KDE.

7.1
Linux agnesi.our.domain.name 2.2.0-pre5 #2 Fri Jan 8 21:29:54 JST 1999 i686 unknown
Kernel modules found
Gnu C egcs-2.91.60
Binutils 2.9.1.0.15
Linux C Library 2.0.7
Dynamic linker ldd (GNU libc) 2.0.7
Linux C++ Library 2.9.0
Procps 1.2.9
Mount 2.7l
Net-tools 1.33
Kbd 0.94
Sh-utils 1.16

7.2
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 1
cpu MHz : 398.278283
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat p
se36 mmx osfxsr
bogomips : 397.31

7.3
(nothing)

7.4
Attached devices:
Host: scsi0 Channel: 00 Id: 03 Lun: 00
Vendor: HP Model: HP35480A Rev: T603
Type: Sequential-Access ANSI SCSI revision: 02

Hope this info may be useful.

Feel free to contact me if more info/testing is needed.

Kind regards,

~ Hav.
____________________________________________________________________________
o
Havard Bell
Dept. of Environmental & Ocean Engineering
The University of Tokyo
7-3-1 Hongo, Bunkyo-ku, Phone: +81-3-3812 2111 ext. 6553
113-8654 TOKYO Fax : +81-3-3815 8360
JAPAN email: HBell@glacialis.no
____________________________________________________________________________

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/