Re: Extremely slow network with e1000 & ip_conntrack

From: David S. Miller
Date: Fri Dec 05 2003 - 15:32:12 EST


On Thu, 04 Dec 2003 21:36:09 +0900
Stephen Lee <mukansai@xxxxxxxxxxxxx> wrote:

> "Feldman, Scott" <scott.feldman@xxxxxxxxx> wrote:
> >
> > Try turning off TSO by disabling this code or by using "ethtool -K tso
> > off" (need version 1.8).
>
> Yes, turning off TSO with ethtool fixed it (tested on 2.6.0-test11). At
> least we have a workaround now.

OK, I've found out what IP conntack does that creates the problems.

In fact, it's a bug in conntrack and this ends up corrupting the TSO
packet. This forces TSO-disabling on that connection, and
retransmission of all the data. Then the data flows correctly so TSO
is re-enabled, and on and on and on like this. Performance goes into
the toilet.

The culprit is net/ipv4/netfilter/ip_conntrack_standalone.c,
in ip_refrag(), it does this:

if ((*pskb)->len > dst_pmtu(&rt->u.dst)) {
/* No hook can be after us, so this should be OK. */
ip_fragment(*pskb, okfn);
return NF_STOLEN;
}

Which fragments TSO packets, oops :)

People can confirm this analysis by applying the patch below, enabling
TSO with conntrack loaded, and see if the problem goes away.

Some auditing is definitely necessary wrt. TSO and netfilter. In particular
I am incredibly confident that we have issues in cases like when the FTP
netfilter modules mangle the data. Another area for inspection are the
cases where TCP header bits are changed and thus the checksum needs to
be adjusted.

===== net/ipv4/netfilter/ip_conntrack_standalone.c 1.22 vs edited =====
--- 1.22/net/ipv4/netfilter/ip_conntrack_standalone.c Thu Oct 2 23:21:19 2003
+++ edited/net/ipv4/netfilter/ip_conntrack_standalone.c Fri Dec 5 12:25:22 2003
@@ -201,7 +201,8 @@
/* Local packets are never produced too large for their
interface. We degfragment them at LOCAL_OUT, however,
so we have to refragment them here. */
- if ((*pskb)->len > dst_pmtu(&rt->u.dst)) {
+ if ((*pskb)->len > dst_pmtu(&rt->u.dst) &&
+ !skb_shinfo(*pskb)->tso_size) {
/* No hook can be after us, so this should be OK. */
ip_fragment(*pskb, okfn);
return NF_STOLEN;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/