Bandwidth cap reached for networked file systems on IBM xSeries
From: Jason Czerak
Date: Fri Sep 26 2008 - 23:25:17 EST
Please excuse my ignorance if this is the incorrect list for this. But
my problem more directly relates to networked file systems. Please
direct me to the correct list if appropriate.
I'm currently working with a 3 node IBM x3850M2 cluster with a NetApp
6080 as the back end NAS connected with 10Gig Ethernet (chelsio nics,
latest T3 drivers) in production. Test and Integration are 2 x3650's
I'm not very knowledgeable as to the inter workings of mount and how
it interacts with NFS and SAMBA and creates a file system path. I
believe with in this mechanism there is a bottleneck for available
bandwidth. This bottle neck changes with hardware architecture. For
example, the speed cap is much lower on the x3850 series server then
it is on a x3650. I believe difference in FSB, memory pipes and other
various differences that makes the x3850 a better threaded server vs
the x3650 help contribute to this move able performance cap.
For a few weeks I've been throwing many possible combination at these
servers confused as to why single threaded, or light threaded
performance over all is slow! All the typical and not so typical NFS
settings are there, RPC slots, rsize=32k, little bigger windows on TCP
(RHEL5.2 is close). Suggestions from Chelsio. There isn't anything
that you can google that's going to make things faster here.
The little tcp testing tool "ttcp" with 3 threads, can generate 9.6Gb
between nodes. Doesn't matter the server type. The OS and hardware
are fully capable of moving massive amounts of data. Even with apache
with ZERO tuning and just 2 threads can saturate 10Gig.
Just a small sub set of tests here using sio_ntap_linux (compiled for
64-bit) showed a cap of 350MB/sec from the NAS. This is with 8
threads, 32K blocks. I started to test with more threads and test
with 2 filters. Same cap! Naturally I'm using 2 TCP paths here. I can
not pull more then 350MB/sec of data from any amount of hosts, netapp
I move to SMB, using /dev/shm and /dev/ram0's drives as stores to
avoid disk reads. Some tests include ensuring all was in cache/buffer
with a quick cat file >> /dev/null.. Same results. I had that cap of
The next logical test was using SIO to pull data from SMB and NetApp.
Same cap. So my problem is narrowed down to what ever mounts these
2 file systems.
These same tests on the x3650 yields about 420-470meg/sec cap.
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html