Re: poll() blocked / packets not received ?
From: Nicolas Cannasse
Date: Mon Oct 20 2008 - 06:47:16 EST
We have Shorewall installed and enabled, but what seems strange is that
the problem depends on multithreading. It also occurs much more often on
the 4 core machines than on a 2 core ones (both with Hyperthreading
activated). We're using kernel 2.6.20-15-server (#2 SMP) provided by Ubuntu.
Any tip on we could fix that or investigate further would be
appreciated. After one month of debugging we're really out of solution now.
Your usage pattern is a very common one, I highly doubt you are experiencing
a kernel bug here or many people (including myself) would be complaining.
Shorewall sounds like it might be suspect, are FIN's not coming in when the
remote closes? You can look in the output of netstat to see what state the
TCP is in, still ESTABLISHED?
Yes, it's still ESTABLISHED, but we can't see the corresponding
connection on the other machine while running netstat. I'm not a TCP
expert, so I'm not sure in which case this can occur.
I agree with your comment in general, except that we have been running
the same application in single-thread environment for years without
running into this very specific problem.
The only logs we get in the dmesg are the following :
either (a few everyday) :
[10742708.006350] TCP: Treason uncloaked! Peer 220.127.116.11:32924/80
shrinks window 4049064122:4049064123. Repaired.
Or (more often) :
[10755036.856217] Shorewall:net2all:DROP:IN=eth0 OUT=
DST=XX.XX.XX.43 LEN=404 TOS=0x00 PREC=0x00 TTL=114 ID=12366 PROTO=UDP
SPT=1057 DPT=1434 LEN=384
Both SRC/DST IPs does not correspond to the connections that are
stalled, since they occur on the local network.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/