Re: `tcpdump` host-host

David S. Miller (davem@dm.cobaltmicro.com)
Mon, 23 Mar 1998 17:03:05 -0800


I checked out these dumps, and here is what I see. I am assuming that
204.178.40.143 is your Sun box, and 204.178.40.224 is the Linux-2.1.90
machine, am I correct?

Anyways, I think the bug is in the TCP code running on the Sun, here
is an example snippet:

204.178.40.143.19 > 204.178.40.224.1119: P 16990:17064(74) ack 1 win 8760 (DF) (ttl 255, id 38847)
204.178.40.143.19 > 204.178.40.224.1119: . 17064:18470(1406) ack 1 win 8760 (DF) (ttl 255, id 38848)
204.178.40.143.19 > 204.178.40.224.1119: P 18470:18544(74) ack 1 win 8760 (DF) (ttl 255, id 38849)
204.178.40.143.19 > 204.178.40.224.1119: . 18544:19950(1406) ack 1 win 8760 (DF) (ttl 255, id 38850)
204.178.40.143.19 > 204.178.40.224.1119: P 19950:20024(74) ack 1 win 8760 (DF) (ttl 255, id 38851)
204.178.40.143.19 > 204.178.40.224.1119: . 20024:21430(1406) ack 1 win 8760 (DF) (ttl 255, id 38852)
204.178.40.143.19 > 204.178.40.224.1119: P 21430:21504(74) ack 1 win 8760 (DF) (ttl 255, id 38853)
204.178.40.143.19 > 204.178.40.224.1119: . 21504:22910(1406) ack 1 win 8760 (DF) (ttl 255, id 38854)
204.178.40.143.19 > 204.178.40.224.1119: P 22910:22984(74) ack 1 win 8760 (DF) (ttl 255, id 38855)

I'm sorry, that shows complete bogus data packaging being done by the
BSD networking running on the Sun (I assume it's running SunOS, the
data length behavior is very consistant with that of 4.3 BSD stacks).
And there are even worse segments:

204.178.40.143.19 > 204.178.40.224.1119: P 1450:1672(222) ack 1 win 8760 (DF) (ttl 255, id 38826)
204.178.40.143.19 > 204.178.40.224.1119: P 1672:1722(50) ack 1 win 8760 (DF) (ttl 255, id 38827)
204.178.40.143.19 > 204.178.40.224.1119: . 1722:3152(1430) ack 1 win 8760 (DF) (ttl 255, id 38828)
204.178.40.143.19 > 204.178.40.224.1119: P 3152:4484(1332) ack 1 win 8760 (DF) (ttl 255, id 38829)
204.178.40.143.19 > 204.178.40.224.1119: P 4484:4520(36) ack 1 win 8760 (DF) (ttl 255, id 38830)
204.178.40.143.19 > 204.178.40.224.1119: . 4520:5964(1444) ack 1 win 8760 (DF) (ttl 255, id 38831)

Bleech... In contrast look at how Linux boxes queue things in the same
situation:

204.178.40.236.19 > 204.178.40.224.1118: P 22280:23716(1436) ack 1 win 31856 <nop,nop,timestamp 5148729 6063797> (DF) (ttl 64, id 31937)
204.178.40.236.19 > 204.178.40.224.1118: P 23716:25152(1436) ack 1 win 31856 <nop,nop,timestamp 5148729 6063797> (DF) (ttl 64, id 31938)
204.178.40.236.19 > 204.178.40.224.1118: P 25152:26588(1436) ack 1 win 31856 <nop,nop,timestamp 5148729 6063797> (DF) (ttl 64, id 31939)
204.178.40.224.1118 > 204.178.40.236.19: . ack 26588 win 26064 <nop,nop,timestamp 6063798 5148729> (DF) (ttl 64, id 35036)
204.178.40.236.19 > 204.178.40.224.1118: P 26588:28024(1436) ack 1 win 31856 <nop,nop,timestamp 5148729 6063797> (DF) (ttl 64, id 31940)
204.178.40.236.19 > 204.178.40.224.1118: P 28024:29460(1436) ack 1 win 31856 <nop,nop,timestamp 5148729 6063797> (DF) (ttl 64, id 31941)
204.178.40.236.19 > 204.178.40.224.1118: P 29460:30896(1436) ack 1 win 31856 <nop,nop,timestamp 5148729 6063797> (DF) (ttl 64, id 31942)
204.178.40.224.1118 > 204.178.40.236.19: . ack 30896 win 31856 <nop,nop,timestamp 6063798 5148729> (DF) (ttl 64, id 35037)
204.178.40.236.19 > 204.178.40.224.1118: P 30896:32332(1436) ack 1 win 31856 <nop,nop,timestamp 5148729 6063797> (DF) (ttl 64, id 31943)
204.178.40.236.19 > 204.178.40.224.1118: P 32332:33768(1436) ack 1 win 31856 <nop,nop,timestamp 5148729 6063797> (DF) (ttl 64, id 31944)
204.178.40.236.19 > 204.178.40.224.1118: P 33768:35204(1436) ack 1 win 31856 <nop,nop,timestamp 5148729 6063797> (DF) (ttl 64, id 31945)
204.178.40.224.1118 > 204.178.40.236.19: . ack 35204 win 31856 <nop,nop,timestamp 6063799 5148729> (DF) (ttl 64, id 35038)

Later on the timing jitters a bit, and we begin to queue smaller
frames, but we do act consistantly nonetheless, here's a snippet of
this behavior:

204.178.40.236.19 > 204.178.40.224.1118: P 59718:60458(740) ack 1 win 31856 <nop,nop,timestamp 5148732 6063799> (DF) (ttl 64, id 31965)
204.178.40.236.19 > 204.178.40.224.1118: P 60458:61198(740) ack 1 win 31856 <nop,nop,timestamp 5148732 6063799> (DF) (ttl 64, id 31966)
204.178.40.236.19 > 204.178.40.224.1118: P 61198:61938(740) ack 1 win 31856 <nop,nop,timestamp 5148732 6063799> (DF) (ttl 64, id 31967)
204.178.40.236.19 > 204.178.40.224.1118: P 61938:62678(740) ack 1 win 31856 <nop,nop,timestamp 5148732 6063799> (DF) (ttl 64, id 31968)
204.178.40.224.1118 > 204.178.40.236.19: . ack 62678 win 28960 <nop,nop,timestamp 6063801 5148732> (DF) (ttl 64, id 35045)

It's a side effect of the speed of the ACK'ing clock coming back to
the sender, and how quickly the process does it's writes. Essentially
the congestion window has opened up wide enough that the sending TCP
can put the packets onto the wire before the process can sneak in and
and do another write. If the process gets there early enough, we can
take his short write onto the end of a packet which still has not gone
out. This _is_ happening above, which is why the data gets clumped
into nice 1436 byte at a time packets.

There is a little comment in tcp_do_sendmsg() which mentions that we
perhaps might want to come up with a heuristic which sometimes
intelligently holds back the sends a little to allow the full
collapsing of data to occur (ie. long enough for the process to do
a few more tiny writes).

Anyways, at this point I say it's a SunOS bug... if it's going to
package the writes into packets of that size, there isn't much our
ACKing policy as a receiver can do to alleviate the situation.
(BTW: For the SunOS-->Linux dumps, where did you do the sniffing, on
the Linux machine, on the SunOS one, or on some other host? If on
some other host where was it in relation to the two machines doing
the transfer?)

Later,
David S. Miller
davem@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu