Re: question about 3sec timeouts with tcp
From: H. Willstrand
Date: Tue Apr 01 2008 - 14:28:25 EST
On Tue, Apr 1, 2008 at 7:59 PM, Gabriel Barazer <gabriel@xxxxxxxx> wrote:
>
> On 04/01/2008 7:17:31 PM +0200, Leo <neleo@xxxxxxx> wrote:
> > H. Willstrand wrote:
> >> On Tue, Apr 1, 2008 at 5:43 PM, Gabriel Barazer <gabriel@xxxxxxxx> wrote:
> >>
> >>> On 04/01/2008 4:43:20 PM +0200, Brett Paden <paden@xxxxxxxxxxxx> wrote:
> >>> >> If I'm right Brett's problem relays in the test client (provided in
> >>> >> the first mail). This has probably to do with the number of ports
> >>> >> opened and closed during a short time period.
> >>> >
> >>> > My test client is designed to simulate the sort of load our
> >>> production
> >>> > databases and web servers see. We're talking on the order of 100-400
> >>> > connections per second. On an unloaded server the 3000ms occur right
> >>> > around 400 connections a second but we have seen them a lower
> >>> connection
> >>> > rates. Are you suggesting that we could do something simple (like
> >>> reap
> >>> > TIME_WAIT connections) to allevaite the problem?
> >>>
> >>> Using tcp_tw_recycle / tcp_tw_reuse doesn't solve the problem either on
> >>> the client nor on the server. I tested with and without these options
> >>> enabled, disabled netfilter's connection tracking and none solved this
> >>> delay. If even the "lo" interface is concerned, there is definitely
> >>> something into the network stack and not the device drivers.
> >>>
> >>> Here is a thread I started on LKML about this very same bug.
> >>> http://lkml.org/lkml/2008/3/14/353
> >>> There is a forum thread with french hosting providers talking about it.
> >>> (if some of you read french:
> >>> http://www.webmasterclub.fr/forum/topic,59486,0.html)
> >>>
> >>> We are far from being alone!
> >>>
> > Welcome to the club, Gabriel!
> >>> Gabriel
>
> How lucky I am!
> I suspect there are many other people having this problem out there,
> they just don't notice these delays on small infrastructures and because
> this bug doesn't actually cause a connection error, but "only" an
> unacceptable delay for moderate to high busy servers.
>
>
> >> Ok, seams to be the same issue that Leo has (has nothing to do with
> >> the Brett / Marlon issue, only common dominator is the 3000ms).
> >>
> > But Gabriel is also talking about 3 second timeouts on the client as
> > Brett and I did. I have read Gabriel's description on the provided link
> > and it seems to be exactly the same problem. I think Brett can confirm
> > this ...
> >> This issue is probably caused by server delivering as miscalculated
> >> SYN/ACK (the acked number is miscalculated, see my second mail).
> >>
> > When you look at my first tcpdump with two machines as server and client
> > then you can see that there are no miscalculated SYN/ACK packets from
> > the server (and therefore no RST packet from the client). All packets
> > have the right number but the client never receives the SYN/ACK packet
> > from the server. Only at the lo test there are RST packets and wrong
> > packet numbers. But as I told you in my last email I think this is a
> > different problem and not important for us. We should ignore the lo test
> > and concentrate on the "real" problem of Brett, Gabriel and myself (and
> > even a lot of other people out there).
>
> I confirm that there is no problem is the sequence numbers. Attached is
> the pcap compatible capture of the relevant packets (608 bytes, 6
> packets total: 2 for the failed handshake, 3 for the successful one and
> 1 for the first mysql data packet). This capture has been filtered to
> show only the relevant packets and done in promiscuous mode.
>
> Gabriel
>
I'm missing the tcpdump...
And yes, the localhost is one issue (Leo's second tcpdump with lo interface).
We all expirence the 3000ms delay, so please post your tcpdumps Gabriel.
//HW
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html