Re: poll() blocked / packets not received ?

From: Nicolas Cannasse
Date: Mon Oct 20 2008 - 06:47:16 EST

We have Shorewall installed and enabled, but what seems strange is that the problem depends on multithreading. It also occurs much more often on the 4 core machines than on a 2 core ones (both with Hyperthreading activated). We're using kernel 2.6.20-15-server (#2 SMP) provided by Ubuntu.

Any tip on we could fix that or investigate further would be appreciated. After one month of debugging we're really out of solution now.


Your usage pattern is a very common one, I highly doubt you are experiencing
a kernel bug here or many people (including myself) would be complaining.

Shorewall sounds like it might be suspect, are FIN's not coming in when the
remote closes? You can look in the output of netstat to see what state the
TCP is in, still ESTABLISHED?

Yes, it's still ESTABLISHED, but we can't see the corresponding connection on the other machine while running netstat. I'm not a TCP expert, so I'm not sure in which case this can occur.

I agree with your comment in general, except that we have been running the same application in single-thread environment for years without running into this very specific problem.

The only logs we get in the dmesg are the following :

either (a few everyday) :

[10742708.006350] TCP: Treason uncloaked! Peer shrinks window 4049064122:4049064123. Repaired.

Or (more often) :

[10755036.856217] Shorewall:net2all:DROP:IN=eth0 OUT= MAC=00:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:00 SRC= DST=XX.XX.XX.43 LEN=404 TOS=0x00 PREC=0x00 TTL=114 ID=12366 PROTO=UDP SPT=1057 DPT=1434 LEN=384

Both SRC/DST IPs does not correspond to the connections that are stalled, since they occur on the local network.

