Getting lousy NFS + tar-pipe throughput on 2.4.20

From: Timothy Miller
Date: Fri Feb 13 2004 - 18:44:34 EST


I'm running a fresh install of RH9 (kernel 2.4.20-something) on a workstation. The workstation is an Athlon 3200+ with 512 megs of RAM on an ABIT KV7 (KT600 chipset). The ethernet controller built into the KV7 is "VIA RhineII". The file system is ext3.


We are mounting an NFS filesystem from a Sun box using automount, and we're using a tar-pipe to move data from the server to the workstation. Both tars of the tar-pipe are running on the workstation, so the network traffic is all NFS.

(1) We have verified that the disk load on the server is very low. The disk is not being saturated.

(2) We have verified that the ethernet on the server is not being saturated.

(3) The workstation is connected to the server through a switch, so it's not competing for bandwidth with anything else.


In theory, we should get about 10 megabytes/sec throughput, but what we're measuring is about 1 to 2 megs/sec.


The workstation is using a single 120 gig WD IDE drive (WD1200JB), which as I was talking about in other emails should be able to do up to 30 megs/sec for writes.


While this tar-pipe is going on, the workstation is very unresponsive. "top" reports that kernel CPU usage is anywhere from 30% to 70%, but mostly around 40%. User space is using about 10%; that varies also. Despite the fact that there is some amount of idle time, the X cursor jumps about badly.

We're not compressing or anything. We're just doing the tar-pipe. Therefore, the workstation should be experiencing very little load while it transfers a mere 10 megs/sec to disk. Buffering in RAM should also allow the kernel to order writes efficiently.

Since the source tar process is talking to an NFS volume, the overhead of opening, reading, and closing small files could hurt throughput (would have been better to rsh the source tar so that the tar data is what was going over ethernet through a single socket). But that should _reduce_ the amount of I/O that is being accomplished, thereby reducing the work being done by the workstation. It would just WAIT more. It should not be unresponsive.


I would like to investigate this performance issue, but I don't know what tools I should run to investigate. If anyone could please give me some tips on it, I would be most appreciative.

Thanks!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/