Re: [RFC] nfs: use 4*rsize readahead size

From: Dean Hildebrand
Date: Wed Apr 14 2010 - 17:22:33 EST


You cannot simply update linux system tcp parameters and expect nfs to work well performance-wise over the wan. The NFS server does not use system tcp parameters. This is a long standing issue. A patch was originally added in 2.6.30 that enabled NFS to use linux tcp buffer autotuning, which would resolve the issue, but a regression was reported (http://thread.gmane.org/gmane.linux.kernel/826598 ) and so they removed the patch.

Maybe its time to rethink allowing users to manually set linux nfs server tcp buffer sizes? Years have passed on this subject and people are still waiting. Good performance over the wan will require manually setting tcp buffer sizes. As mentioned in the regression thread, autotuning can reduce performance by up to 10%. Here is a patch (slightly outdated) that creates 2 sysctls that allow users to manually to set NFS TCP buffer sizes. The first link also has a fair amount of background information on the subject.
http://www.spinics.net/lists/linux-nfs/msg01338.html
http://www.spinics.net/lists/linux-nfs/msg01339.html

Dean


Wu Fengguang wrote:
On Wed, Mar 03, 2010 at 02:42:19AM +0800, Trond Myklebust wrote:
On Tue, 2010-03-02 at 12:33 -0500, John Stoffel wrote:
"Trond" == Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> writes:
Trond> On Tue, 2010-03-02 at 11:10 +0800, Wu Fengguang wrote:
Dave,

Here is one more test on a big ext4 disk file:

16k 39.7 MB/s
32k 54.3 MB/s
64k 63.6 MB/s
128k 72.6 MB/s
256k 71.7 MB/s
rsize ==> 512k 71.7 MB/s
1024k 72.2 MB/s
2048k 71.0 MB/s
4096k 73.0 MB/s
8192k 74.3 MB/s
16384k 74.5 MB/s

It shows that >=128k client side readahead is enough for single disk
case :) As for RAID configurations, I guess big server side readahead
should be enough.
Trond> There are lots of people who would like to use NFS on their
Trond> company WAN, where you typically have high bandwidths (up to
Trond> 10GigE), but often a high latency too (due to geographical
Trond> dispersion). My ping latency from here to a typical server in
Trond> NetApp's Bangalore office is ~ 312ms. I read your test results
Trond> with 10ms delays, but have you tested with higher than that?

If you have that high a latency, the low level TCP protocol is going
to kill your performance before you get to the NFS level. You really
need to open up the TCP window size at that point. And it only gets
worse as the bandwidth goes up too.
Yes. You need to open the TCP window in addition to reading ahead
aggressively.

I only get ~10MB/s throughput with following settings.

# huge NFS ra size
echo 89512 > /sys/devices/virtual/bdi/0:15/read_ahead_kb

# on both sides
/sbin/tc qdisc add dev eth0 root netem delay 200ms

net.core.rmem_max = 873800000
net.core.wmem_max = 655360000
net.ipv4.tcp_rmem = 8192 87380000 873800000
net.ipv4.tcp_wmem = 4096 65536000 655360000

Did I miss something?

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/