Re: TCP_MAXSEG vs TCP/generic segmentation offload (tso/gso)

From: Eric Dumazet
Date: Thu Nov 25 2010 - 09:27:44 EST


Le jeudi 25 novembre 2010 Ã 14:44 +0100, Niels MÃller a Ãcrit :
> [ This is a slightly updated repost of a an October 21 mail to the
> linux-net list. Any hints or advice appreciated. /Niels ]
>

CC netdev

> I have been observing large ethernet packets when generating TCP traffic
> over a local ethernet, up to a bit over 20000 bytes, even though the
> interface MTU is 1500 bytes.
>
> Furthermore, I tried to use setsockopt with TCP_MAXSEG to limit the TCP
> segment size further, to 1000 bytes, and that didn't have any effect.
>
> When bugreporting a related problem to the debian kernel maintainers, I
> was told that the behaviour may be linked to the use of TCP segmentation
> offload (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=600286).
>
> Disabling TSO and GSO using ethtool solves both problems: Generated
> packets now are limited in size by both the interface MTU and the
> segment size set with setsockopt. (Except the atl1c driver, where
> ethtool -K eth0 tso off only results in a "Cannot set device tcp
> segmentation offload settings: Operation not supported").
>
> Before I try to write proper bug reports on specific network drivers (I
> have seen problems with several network drivers on different machines,
> unfortunately using different linux versions), I would like to know:
>
> 1. Is TCP_MAXSEG supposed to work at all with network drivers that do
> tcp segmentation offload?
>
> 2. If it is supposed to work, can someone give a rough sketch on how the
> per-socket segment size, set with setsockopt(... TCP_MAXSEG,...), is
> passed down to the driver and to the network hardware? I suspect it
> ought to be passed with each "pseudo-packet" to be transmitted.
>
> I have spent some time searching the documentation and the net for
> answers, without result, hence I'm posting to this list. I'm not
> subscribed, so please cc any replies.
>
> (Regarding packets larger than the interface MTU, that seems clearly
> buggy to me, and I think I already know enough to be able to file proper
> bug reports. And in the atl1c driver, it appears to have been fixed
> between 1.0.0.1-NAPI and 1.0.1.0-NAPI).

GSO is a software technique. Same for GRO.

Physical frames are indeed 1500 bytes (on regular ethernet links)

tcpdump gives you the high level view, before segmentation done in lower
levels (by NIC itself or in linux stack) in Transmit path.

We also have GRO in receive path, able to coalesce several 1500 bytes
frames into a single one (if same tcp flow), so that overhead in stacks
is lowered (netfilter, IP stack, tcp stack, bridge, routing ...)

So... there is no 'bug', unless you trust too much tcpdump output.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/