Re: PROBLEM: Sustained network traffic dies after a few seconds

Jonathan Foster Kliegman (jk8q+@andrew.cmu.edu)
Mon, 28 Jun 1999 02:36:43 +0000


Hans wrote:
>
>
>
> ------------------------------------------------------------------------
>
> Dear kernel-hackers,
>
> Here's a problem I struggled with for the last three weeks.
> What follows complies with the format as described in
> /usr/src/linux/REPORTING-BUGS
>
> 1. Sustained network traffic dies after a few seconds

I've noticed a similar (perhaps the same?) problem here with a custom
network driver we're developing. It has to do with skbuffs not being
reclaimed correctly. Somehow with our code the skb->users value is at 1
when we enter the dev->hard_start_xmit function but when we go to free
it later its been set to 2. Looking at other devices on the system
(namely tulip.o) it is still set to 1 when it goes to free it. We're
currently unsure as to why this happens, tho, so I'm afraid I'm not much
help.

As a side note to see how similar our problems are, a ping flood would
die at exactly 512 packets on incoming requests, 513 packets for
outgoing requests. If we rmmod then insmod our driver it would work
again for 512 more packets but we had a memory leak each time we did
this (our belief was it was in those unfreed skbuffs).

-Jon

> 2. When a transfer is started over the (ether)network, it stops after a few
> seconds. Communications between the two hosts can only re-established
> by doing "ifconfig ... down" and "ifconfig ... up" (on BOTH sides) or
> rebooting (in case of a VXT2000). No error-messages. ifconfig simply
> reports some collisions/errors/drops.
> This is very reproducible by doing rcp, ftp or tftp a file that is larger
> than 1Mb or so. An even quicker way to reproduce it, is a "ping -f ...".
> This happens between 2 linux-PC's but also on a linux-PC that pumps an
> image (1.2Mb) to an X-terminal (VXT2000) with tftp.
> This problem appeared when I upgraded from RedHat 5.2 (kernel 2.0.36) to
> SuSE 6.1 (kernel 2.2.5). I also tried the SuSE 6.1 upgrade (kernel 2.2.7)
> and after somebody told me the 2.2.7 is buggy, I tried 2.2.9 (straight
> from www.linux.org). Still the same problem. Then I perused THOUSANDS
> of articles in comp.os.linux.networking. Two cases looked similar to this
> problem and the solution was: upgrade to 2.2.10. Well, 2.2.10 still had
> this problem.
> I checked my cabling and everything that might be causing this and can
> not find a solution to this problem. Note that when I boot my old 2.0.36
> kernel, THE PROBLEM DOES NOT OCCUR (which tells me that it is no hardware
> problem).
> Other things I tried, but did not solve the problem:
> - switching off masquerading
> - not loading the vmmon and vmnet modules from vmware
> - not enabling my second network card (to the internet via a cable modem)
> - disable ip-accounting (with ipchains)
> - switching off "persistent DMA" in the sound configuration
>
> Becoming desparate by now, I've taken the liberty to post this to you.
>
> 3. Keyword(s): Networking
>
> 4. cat /proc/version
>
> Linux version 2.2.10 (root@n320) (gcc version egcs-2.91.66 19990314
> (egcs-1.1.2 release)) #1 Sun Jun 27 16:29:56 MEST 1999
>
> 5. oops (not applicable)
>
> 6. how to reproduce: ping -f <any host on this thinwire ethernet> and the
> screen fills with dots after a few seconds
>
> 7. Environment
>
> 7.1 Output of /usr/src/linx/scripts/ver_linux:
>
> -- Versions installed: (if some fields are empty or looks
> -- unusual then possibly you have very old versions)
> Linux n320 2.2.10 #1 Sun Jun 27 16:29:56 MEST 1999 i686 unknown
> Kernel modules 2.2.2-pre6
> Gnu C egcs-2.91.66
> Binutils 2.9.1.0.22
> Linux C Library x 1 root root 2475225 Apr 15 01:57
/lib/libc.so.6
> Dynamic linker ldd (GNU libc) 2.0.7
> Procps 1.2.11
> Mount 2.9i
> Net-tools 1.50
> Kbd 0.99
> Sh-utils 1.12
> Modules Loaded ip_masq_user
>
> 7.2 cat /proc/cpuinfo
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 5
> model name : Pentium II (Deschutes)
> stepping : 2
> cpu MHz : 350.803505
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> sep_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca
cmov pat pse36 mmx osfxsr
> bogomips : 349.80
>
> 7.3 cat /proc/modules (almost everything, including sound, scsi and
networking
> is compiled directly into the kernel and not as modules)
>
> ip_masq_user 2312 0 (autoclean)
>
> 7.4 cat /proc/scsi/scsi
>
> Attached devices:
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
> Vendor: IBM Model: DDRS-39130D Rev: DC1B
> Type: Direct-Access ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 01 Lun: 00
> Vendor: DEC Model: RZ26L (C) DEC Rev: 440C
> Type: Direct-Access ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 02 Lun: 00
> Vendor: IBM Model: DDRS-39130D Rev: DC1B
> Type: Direct-Access ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 03 Lun: 00
> Vendor: PLEXTOR Model: CD-R PX-R412C Rev: 1.05
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 04 Lun: 00
> Vendor: PLEXTOR Model: CD-ROM PX-32TS Rev: 1.03
> Type: CD-ROM ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 06 Lun: 00
> Vendor: DEC Model: TLZ06 (C)DEC Rev: 4BQE
> Type: Sequential-Access ANSI SCSI revision: 02
>
> 7.5 Whatever might be relevant. Well, I guess /var/log/boot.msg contains
> most info I did not include sofar, so here it is:
>
> Inspecting /boot/System.map
> Loaded 7677 symbols from /boot/System.map.
> Symbols match kernel version 2.2.10.
> No module symbols loaded.
> klogd 1.3-3, log source = /proc/kmsg started.
> <4>Linux version 2.2.10 (root@n320) (gcc version egcs-2.91.66 19990314
(egcs-1.1.2 release)) #1 Sun Jun 27 16:29:56 MEST 1999
> <4>Detected 350803505 Hz processor.
> <4>Console: colour VGA+ 80x25
> <4>Calibrating delay loop... 349.80 BogoMIPS
> <4>Memory: 127752k/131008k available (1072k kernel code, 412k
reserved, 1708k data, 64k init)
> <4>CPU: Intel Pentium II (Deschutes) stepping 02
> <6>Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
> <6>Checking 'hlt' instruction... OK.
> <4>POSIX conformance testing by UNIFIX
> <4>mtrr: v1.35 (19990512) Richard Gooch (rgooch@atnf.csiro.au)
> <4>PCI: PCI BIOS revision 2.10 entry at 0xf0720
> <4>PCI: Using configuration type 1
> <4>PCI: Probing PCI hardware
> <6>Linux NET4.0 for Linux 2.2
> <6>Based upon Swansea University Computer Society NET3.039
> <6>NET4: Unix domain sockets 1.0 for Linux NET4.0.
> <6>NET4: Linux TCP/IP 1.0 for NET4.0
> <6>IP Protocols: ICMP, UDP, TCP
> <4>Starting kswapd v 1.5
> <6>parport0: PC-style at 0x378 [SPP,ECP,ECPEPP,ECPPS2]
> <4>parport0: detected irq 7; use procfs to enable interrupt-driven operation.
> <6>Detected PS/2 Mouse Port.
> <6>Serial driver version 4.27 with no serial options enabled
> <6>ttyS00 at 0x03f8 (irq = 4) is a 16550A
> <6>ttyS01 at 0x02f8 (irq = 3) is a 16550A
> <4>pty: 256 Unix98 ptys configured
> <6>lp0: using parport0 (polling).
> <6>Real Time Clock Driver v1.09
> <7>Sound initialization started
> <4><Sound Blaster 16 (4.12)> at 0x220 irq 5 dma 1,5
> <4><Sound Blaster 16> at 0x330 irq 5 dma 0
> <4><Yamaha OPL3> at 0x388
> <4><SoundBlaster EMU8000 (RAM2048k)>
> <7>Sound initialization complete
> <4>RAM disk driver initialized: 16 RAM disks of 4096K size
> <4>PIIX4: IDE controller on PCI bus 00 dev 21
> <4>PIIX4: not 100% native mode: will probe irqs later
> <4> ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:pio
> <4>hda: ATAPI CD-ROM DRIVE 36X MAXIMUM, ATAPI CDROM drive
> <4>hdb: IOMEGA ZIP 100 ATAPI, ATAPI FLOPPY drive
> <4>ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> <4>hda: ATAPI 32X CD-ROM drive, 128kB Cache
> <6>Uniform CDROM driver Revision: 2.55
> <6>hdb: 98304kB, 196608 blocks, 512 sector size
> <6>hdb: 98304kB, 96/64/32 CHS, 4096 kBps, 512 sector size, 2941 rpm
> <6>Floppy drive(s): fd0 is 1.44M
> <6>FDC 0 is a post-1991 82077
> <6>ncr53c8xx: at PCI bus 0, device 12, function 0
> <6>ncr53c8xx: 53c875J detected with Symbios NVRAM
> <6>ncr53c875J-0: rev=0x04, base=0xe1000000, io_port=0xb400, irq=9
> <6>ncr53c875J-0: Symbios format NVRAM, ID 7, Fast-20, Parity Checking
> <6>ncr53c875J-0: initial SCNTL3/DMODE/DCNTL/CTEST3/4/5 = (hex)
05/4e/80/01/00/24
> <6>ncr53c875J-0: final SCNTL3/DMODE/DCNTL/CTEST3/4/5 = (hex)
05/46/a0/00/08/24
> <6>ncr53c875J-0: on-chip RAM at 0xe0800000
> <4>ncr53c875J-0: resetting, command processing suspended for 2 seconds
> <6>ncr53c875J-0: restart (scsi reset).
> <4>ncr53c875J-0: enabling clock multiplier
> <4>ncr53c875J-0: Downloading SCSI SCRIPTS.
> <4>scsi0 : ncr53c8xx - version 3.2
> <4>scsi : 1 host.
> <4>ncr53c875J-0: command processing resumed
> <4> Vendor: IBM Model: DDRS-39130D Rev: DC1B
> <4> Type: Direct-Access ANSI SCSI revision: 02
> <4>Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> <4> Vendor: DEC Model: RZ26L (C) DEC Rev: 440C
> <4> Type: Direct-Access ANSI SCSI revision: 02
> <4>Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
> <4> Vendor: IBM Model: DDRS-39130D Rev: DC1B
> <4> Type: Direct-Access ANSI SCSI revision: 02
> <4>Detected scsi disk sdc at scsi0, channel 0, id 2, lun 0
> <4> Vendor: PLEXTOR Model: CD-R PX-R412C Rev: 1.05
> <4> Type: CD-ROM ANSI SCSI revision: 02
> <4>Detected scsi CD-ROM sr0 at scsi0, channel 0, id 3, lun 0
> <4> Vendor: PLEXTOR Model: CD-ROM PX-32TS Rev: 1.03
> <4> Type: CD-ROM ANSI SCSI revision: 02
> <4>Detected scsi CD-ROM sr1 at scsi0, channel 0, id 4, lun 0
> <4> Vendor: DEC Model: TLZ06 (C)DEC Rev: 4BQE
> <4> Type: Sequential-Access ANSI SCSI revision: 02
> <4>Detected scsi tape st0 at scsi0, channel 0, id 6, lun 0
> Kernel logging (proc) stopped.
> Kernel log daemon terminating.
>
> And the relevant stuff from /var/log/messages (the networkcards):
>
> Jun 27 16:39:32 n320 kernel: ne2k-pci.c:v0.99L 2/7/98 D. Becker/P.
Gortmaker http://cesdis.gsfc.nasa.gov/linux/drivers/ne2k-pci.html
> Jun 27 16:39:32 n320 kernel: ne2k-pci.c: PCI NE2000 clone 'RealTek
RTL-8029' at I/O 0xd000, IRQ 11.
> Jun 27 16:39:32 n320 kernel: eth0: PCI NE2000 found at 0xd000, IRQ 11,
00:00:E8:55:39:1C.
> Jun 27 16:39:32 n320 kernel: ne2k-pci.c: PCI NE2000 clone 'RealTek
RTL-8029' at I/O 0xb800, IRQ 10.
> Jun 27 16:39:32 n320 kernel: eth1: PCI NE2000 found at 0xb800, IRQ 10,
00:00:E8:55:2E:02.
> Jun 27 16:39:32 n320 kernel: Partition check:
> Jun 27 16:39:32 n320 kernel: sda: sda1 sda2 sda3 sda4
> Jun 27 16:39:32 n320 kernel: sdb: sdb1 sdb2
> Jun 27 16:39:32 n320 kernel: sdc: sdc1 sdc2 sdc3 sdc4
> Jun 27 16:39:32 n320 kernel: hdb: hdb4
> Jun 27 16:39:32 n320 kernel: VFS: Mounted root (ext2 filesystem) readonly.
> Jun 27 16:39:32 n320 kernel: Freeing unused kernel memory: 64k freed
> Jun 27 16:39:32 n320 kernel: Adding Swap: 125996k swap-space (priority -1)
> Jun 27 16:39:32 n320 named[120]: starting
> Jun 27 16:39:32 n320 named[120]: cache zone "" loaded (serial 0)
> Jun 27 16:39:32 n320 named[120]: primary zone "0.0.127.in-addr.arpa"
loaded (serial 1997022700)
> Jun 27 16:39:32 n320 named[123]: Ready to answer queries.
> Jun 27 16:39:33 n320 lpd[148]: restarted
> Jun 27 16:39:36 n320 sshd2[160]: Listener created on port 22.
> Jun 27 16:39:36 n320 sshd2[165]: Daemon is running.
> Jun 27 16:39:36 n320 ntpdate[164]: step time server 193.78.241.12
offset -0.017820 sec
> Jun 27 16:39:36 n320 xntpd[169]: ntpd 4.0.92c Thu Apr 15 05:46:04 GMT
1999 (1)
> Jun 27 16:39:36 n320 xntpd[169]: precision = 14 usec
> Jun 27 16:39:36 n320 xntpd[169]: using kernel phase-lock loop 0041
> Jun 27 16:39:36 n320 xntpd[169]: frequency initialized -4.306 from
/etc/ntp.drift
> Jun 27 16:39:36 n320 xntpd[169]: using kernel phase-lock loop 0041
>
> Wel, that's it. I would be very grateful if you can help me. I invested quite
> some time in my transition from 2.0.36 to 2.2.* and I'm willing to help you
> to solve this problem by providing any information you need.
>
> I'm looking forward to your reaction and appreciate any time and effort you
> will spend on this problem,
>
> Yours sincerely,
> Hans de Hartog.

--WAA25432.930537405/ece.cmu.edu--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/