W/RTT verification, linux tcp buffers behaviour

From: Constantinos Makassikis
Date: Fri May 12 2006 - 04:17:29 EST


Here's my problem:

I am trying to verify the formula :

W/RTT = Max Throughput

between two end-hosts belonging to the same private network.

Where:

RTT stands for Round Trip Time
W = min(CWND, AW, SNDBUF)
CWND : the size of the congestion window
AW : the size of the receiver's advertized window
SNDBUF : the size of the send buffer

In order to do this I have made various bandwidth measurements between the
two hosts. More particularly I have fixed the receiver's send buffer
to 16 MBytes whereas I have made the sender's buffer vary between 8 KBytes
and 16 MBytes.

Both hosts, as it can be seen below, are good machines which are linked
to the private network through Gigabit Ethernet.

Hosts' configuration:
--------------------------

Debian Linux 2.6.12-1-amd64-k8-smp
AMD Opteron 246/248
2GB RAM
80 GB HDD
Gigabit Ethernet
Tcp specific options that are set via sysctl can be found at the end
of this letter.


As for the network itself it appears to be of excellent quality since
during the whole experiments no retransmitted is reported and the RTT
ranges between 12 and 13 milliseconds.

Normally, one shouldn't expect to approach very closely W/RTT but given
the quality of both network (no losses and very stable RTT) and end hosts
it is surprising to get at best only 70 % of W/RTT (see below for results).

Bandwidth is measured with Iperf tool
Tcp buffer sizes are set with Iperf tool (via setsockopt() )
Traffic is dumped with tcpdump on both end hosts
Traffic statistics from tcpdump traces are provided by tcptrace tool

The tcpdump's traces which are made for each transfer confirm network quality.

Here are some figures:

RTT SNDBUF RCVBUF MAX SND MAX AW Iperf W/RTT %
----------------------------------------------------------------------------------------
12,7 8 16384 8 6293248 3,57 5,16
69,18
12,7 16 16384 10,76 6293248 6,9 10,32 66,85
12,7 32 16384 21,4 6293248 13,7 20,64 66,37
12,7 64 16384 31,5 6293248 26,7 41,28 64,67
12,7 128 16384 49,5 6293248 54,4 82,56 65,88
12,7 256 16384 213 6293248 105 165,13 63,58
12,7 512 16384 266 6293248 171 330,26 51,77
12,7 1024 16384 - 6293248 382 660,52 57,83
12,7 2048 16384 - 6293248 673 1321,04 50,94
12,7 4096 16384 - 6293248 905 2642,08 34,25

RTT : round trip time
(milliseconds)
SNDBUF : size of tcp send buffer
(KBytes)
RCVBUF : size of tcp receive buffer
(KBytes)
MAX SND : the average amount of data send per RTT (KBytes)
MAX SND is estimated from the tcpdump traces.
MAX AW : maximum size of the advertized window (KBytes)
provided by tcdump's traces
Iperf : Throughput reported by Iperf tool
(Mbits/sec)
W/RTT : Max Throughput reachable
(Mbits/sec)

Even though only the maximum size of the advertized window is reported,
actually the size of the advertized window grows in a few RTT greater
than SNDBUF, thus I assumed safe to take W = min(CWND, SNDBUF) and since no
retransmissions are detected, the CWND grows beyond the size of SNDBUF and so
I took W =SNDBUF to compute W/RTT.

As it can be seen, we hardly reach 70 % of the value predicted by the formula
and apparently it seems that it is due to the fact that MAX SND
remains relatively
low compared to SNDBUF.

Hereafter lie some questions.

Questions:
--------------

1) Am I missing or misunderstanding something ?
2) Do you have any other ideas which could explain the low percentage reached ?
3) Supposing the low percentage is really due to the fact that
sender's buffer isn't
fully used, why isn't it used to its fullest ?
Is there some way to overcome this ?

Misc Questions:
--------------------

i.e.: questions I tried to answer myself by searching around the
internet but for which I didn't find any satisfactory answer or any
answer at all.

4) Why is the advertized window steadily growing until it reaches 6
MBytes instead of being given directly a size of 6 Mbytes at the
beginning of the connection ?
5) Why does the advertized window remain stuck at 6 MBytes ?
6) Why does the kernel allocate twice the size of the buffer size
requested by setsockopt ?

Thank you in advance,

Constantinos


###################
# /etc/sysctl.conf #
###################

# I mainly disabled ecn, fack, dsack,autotuning
# Left rfc1323 as well as sack enabled
# Left only TCP Reno (i.e.: disabled bictcp, vegas, ...)

net/ipv4/tcp_tso_win_divisor=8
net/ipv4/tcp_moderate_rcvbuf=0
net/ipv4/tcp_bic=0
net/ipv4/tcp_vegas_cong_avoid=0
net/ipv4/tcp_westwood=0
net/ipv4/tcp_no_metrics_save=0
net/ipv4/tcp_low_latency=0
net/ipv4/tcp_frto=0
net/ipv4/tcp_tw_reuse=0
net/ipv4/tcp_adv_win_scale=2
net/ipv4/tcp_app_win=31
net/ipv4/tcp_dsack=0
net/ipv4/tcp_ecn=0
net/ipv4/tcp_reordering=3
net/ipv4/tcp_fack=0
net/ipv4/tcp_orphan_retries=0
net/ipv4/tcp_max_syn_backlog=1024
net/ipv4/tcp_rfc1337=0
net/ipv4/tcp_stdurg=0
net/ipv4/tcp_abort_on_overflow=0
net/ipv4/tcp_tw_recycle=0
net/ipv4/tcp_syncookies=0
net/ipv4/tcp_fin_timeout=60
net/ipv4/tcp_retries2=15
net/ipv4/tcp_retries1=3
net/ipv4/tcp_keepalive_intvl=75
net/ipv4/tcp_keepalive_probes=9
net/ipv4/tcp_keepalive_time=7200
net/ipv4/tcp_max_tw_buckets=180000
net/ipv4/tcp_max_orphans=65536
net/ipv4/tcp_synack_retries=5
net/ipv4/tcp_syn_retries=5
net/ipv4/tcp_retrans_collapse=1
net/ipv4/tcp_sack=1
net/ipv4/tcp_window_scaling=1
net/ipv4/tcp_timestamps=1
net/core/rmem_default=8388608
net/core/rmem_max=8388608
net/core/wmem_default=8388608
net/core/wmem_max=8388608
net/ipv4/ip_no_pmtu_disc=0
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html