[TCP bug] stuck distcc connections in latest -git

From: Ingo Molnar
Date: Tue Jul 22 2008 - 07:22:09 EST



* Ingo Molnar <mingo@xxxxxxx> wrote:

> ok, have updated the testboxes to your latest push.
>
> Btw., otherwise the big networking pull held up pretty well on a
> healthy range of testboxes i have, [...]

hm, the distcc TCP hangs are back:

Distcc client box (quad, 10.0.1.16) running v2.6.24:

dione:~> netstat -nt | grep -vw TIME_WAIT | grep 3632
tcp 0 250455 10.0.1.16:55559 10.0.1.19:3632 ESTABLISHED
tcp 0 254743 10.0.1.16:56096 10.0.1.19:3632 ESTABLISHED
tcp 0 219617 10.0.1.16:55674 10.0.1.19:3632 ESTABLISHED

[ ^--- note the stuck send-queue ]

Distcc server box (16-way, 10.0.1.19) running very-latest:

phoenix:~> netstat -nt | grep 10.0.1.16 | grep 3632

tcp 0 0 10.0.1.19:3632 10.0.1.16:55559 ESTABLISHED
tcp 0 0 10.0.1.19:3632 10.0.1.16:56096 ESTABLISHED
tcp 0 0 10.0.1.19:3632 10.0.1.16:55674 ESTABLISHED

tcp 0 0 10.0.1.19:3632 10.0.1.16:34411 ESTABLISHED
tcp 0 0 10.0.1.19:3632 10.0.1.16:51094 ESTABLISHED
tcp 0 0 10.0.1.19:3632 10.0.1.16:60787 ESTABLISHED
tcp 0 0 10.0.1.19:3632 10.0.1.16:50874 ESTABLISHED

I.e. the client side send-queue is stuck in established state, server
side thinks it's a proper established connection. Nobody makes any
progress.

Also note the final 4 connections on the server side - those are not
present on the client box.

The hung condition seemed permanent (i waited a couple of minutes).

Then i shut down the distccd on the server side, which propagated to the
client:

distcc[18496] (dcc_pump_sendfile) ERROR: sendfile failed: Broken pipe
distcc[18496] (dcc_readx) ERROR: unexpected eof on fd4
distcc[18496] (dcc_r_token_int) ERROR: read failed while waiting for token "DONE"
distcc[18496] Warning: failed to distribute kernel/futex.c to ph/20, running locally instead

Server side lingered in FIN_WAIT2 a bit:

Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 10.0.1.19:3632 10.0.1.16:56096 FIN_WAIT2
tcp 0 0 10.0.1.19:3632 10.0.1.16:55559 FIN_WAIT2

I retried the same build 10 times and it would not reproduce - so this
again is a hard to reproduce condition. (and there's no chance to get a
proper tcpdump either, at these traffic levels)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/