Re: question about 3sec timeouts with tcp

From: H. Willstrand
Date: Mon Mar 31 2008 - 09:31:34 EST


On Mon, Mar 31, 2008 at 1:54 PM, Leo <neleo@xxxxxxx> wrote:
> I have also tested it with server and client directly connected (without
> switch). And even if server and client are the same machine (listen AND
> connect to 127.0.0.1 on port 3306, interface lo) this problem is
> reproducable! Therefore it can't be a cable or switch problem.
>
> But one thing is different when testing with localhost: there is a RST
> packet!
>
> tcpdump -i lo
>
> 13:39:55.934575 IP localhost.50171 > localhost.mysql: S
> 285135912:285135912(0) win 32792 <mss 16396,sackOK,timestamp 627887
> 0,nop,wscale 7>
> 13:39:55.934590 IP localhost.mysql > localhost.50171: . ack 103990558
> win 256 <nop,nop,timestamp 627887 626728>

Hmm... next packet to be acked: 285135912 + 1, server sends ack 103990558
This might be the reason for the reset.

> 13:39:55.934598 IP localhost.50171 > localhost.mysql: R
> 103990558:103990558(0) win 0
> 13:39:58.931261 IP localhost.50171 > localhost.mysql: S
> 285135912:285135912(0) win 32792 <mss 16396,sackOK,timestamp 628187
> 0,nop,wscale 7>
> 13:39:58.931277 IP localhost.mysql > localhost.50171: S
> 334398077:334398077(0) ack 285135913 win 32768 <mss
> 16396,sackOK,timestamp 628187 628187,nop,wscale 7>

Now the server sends ack 285135913 (285135912 + 1), and the connection is OK.

> 13:39:58.931289 IP localhost.50171 > localhost.mysql: . ack 1 win 257
> <nop,nop,timestamp 628187 628187>
> 13:39:58.931414 IP localhost.50171 > localhost.mysql: F 1:1(0) ack 1 win
> 257 <nop,nop,timestamp 628187 628187>
> 13:39:58.931437 IP localhost.mysql > localhost.50171: F 1:1(0) ack 2 win
> 256 <nop,nop,timestamp 628187 628187>
> 13:39:58.931452 IP localhost.50171 > localhost.mysql: . ack 2 win 257
> <nop,nop,timestamp 628187 628187>
>
> Any idea?

Seams to be a issue at server side... What is the kernel version at
server-side? Kernel config?

Br H. Willstrand

>
> Kind regards,
> Leo
>
>
>
>
> H. Willstrand wrote:
> > On Mon, Mar 31, 2008 at 12:50 PM, Leo <neleo@xxxxxxx> wrote:
> >
> >> Here comes the tcpdump (the server has IP 192.168.1.1 listening on port
> >> 3306 and the client has IP 192.168.1.99).
> >>
> >> tcpdump on the server:
> >>
> >> 11:24:44.226373 IP 192.168.1.99.39028 > 192.168.1.1.3306: S
> >> 3068401272:3068401272(0) win 5840 <mss 1460,sackOK,timestamp 10592352
> >> 0,nop,wscal e 6>
> >> 11:24:44.226389 IP 192.168.1.1.3306 > 192.168.1.99.39028: S
> >> 3052139931:3052139931(0) ack 3068401273 win 5792 <mss
> >> 1460,sackOK,timestamp 10478 87 10592352,nop,wscale 7>
> >> 11:24:47.226673 IP 192.168.1.99.39028 > 192.168.1.1.3306: S
> >> 3068401272:3068401272(0) win 5840 <mss 1460,sackOK,timestamp 10595352
> >> 0,nop,wscale 6>
> >> 11:24:47.226715 IP 192.168.1.1.3306 > 192.168.1.99.39028: S
> >> 3052139931:3052139931(0) ack 3068401273 win 5792 <mss
> >> 1460,sackOK,timestamp 1048187 10592352,nop,wscale 7>
> >> 11:24:47.226957 IP 192.168.1.99.39028 > 192.168.1.1.3306: . ack 1 win 92
> >> <nop,nop,timestamp 10595352 1048187>
> >> 11:24:47.227488 IP 192.168.1.99.39028 > 192.168.1.1.3306: F 1:1(0) ack 1
> >> win 92 <nop,nop,timestamp 10595352 1048187>
> >> 11:24:47.227580 IP 192.168.1.1.3306 > 192.168.1.99.39028: F 1:1(0) ack 2
> >> win 46 <nop,nop,timestamp 1048187 10595352>
> >> 11:24:47.227810 IP 192.168.1.99.39028 > 192.168.1.1.3306: . ack 2 win 92
> >> <nop,nop,timestamp 10595353 1048187>
> >>
> >> tcpdump on the client:
> >>
> >> 11:24:44.045735 IP 192.168.1.99.39028 > 192.168.1.1.3306: S
> >> 3068401272:3068401272(0) win 5840 <mss 1460,sackOK,timestamp 10592352
> >> 0,nop,wscale 6>
> >> 11:24:47.045710 IP 192.168.1.99.39028 > 192.168.1.1.3306: S
> >> 3068401272:3068401272(0) win 5840 <mss 1460,sackOK,timestamp 10595352
> >> 0,nop,wscale 6>
> >> 11:24:47.045911 IP 192.168.1.1.3306 > 192.168.1.99.39028: S
> >> 3052139931:3052139931(0) ack 3068401273 win 5792 <mss
> >> 1460,sackOK,timestamp 1048187 10592352,nop,wscale 7>
> >> 11:24:47.046015 IP 192.168.1.99.39028 > 192.168.1.1.3306: . ack 1 win 92
> >> <nop,nop,timestamp 10595352 1048187>
> >> 11:24:47.046541 IP 192.168.1.99.39028 > 192.168.1.1.3306: F 1:1(0) ack 1
> >> win 92 <nop,nop,timestamp 10595352 1048187>
> >> 11:24:47.046845 IP 192.168.1.1.3306 > 192.168.1.99.39028: F 1:1(0) ack 2
> >> win 46 <nop,nop,timestamp 1048187 10595352>
> >> 11:24:47.046880 IP 192.168.1.99.39028 > 192.168.1.1.3306: . ack 2 win 92
> >> <nop,nop,timestamp 10595353 1048187>
> >>
> >> I can also provide a strace if required (alternatively you can find the
> >> tcpdump and the strace in my original posting, e.g.
> >> http://marc.info/?l=linux-net&m=120214156112338&w=2).
> >>
> >> Kind regards,
> >> Leo
> >>
> >>
> >
> > Well, the server sends back the SYN/ACK. Your client is not receiving it...
> > Normally I will say a network problem (switch / router), please check
> > that once more.
> >
> > (There is / was a bug in PCAP before which lost packages in tcpdump,
> > but this seams not be the case here)
> >
> > Br H.Willstrand
> >
> >
> >
> >>
> >> H. Willstrand wrote:
> >> > On Mon, Mar 31, 2008 at 11:33 AM, Leo <neleo@xxxxxxx> wrote:
> >> >
> >> >> Yes, we have changed the back_log settings in my.cnf. We also tried
> >> >> various combinations of the kernel parameters you mentioned in your
> >> >> email - without success.
> >> >>
> >> >> Have you made any changes on the client (=webserver)? Perhaps it's not a
> >> >> problem of the server (=mysql). As I figured out in my original posting
> >> >> it looks as if the client didn't receive the SYN/ACK package from the
> >> >> server (keep in mind that it is reproducable with lo interface,
> >> >> therefore it's not a hardware problem with cables, switches, ...):
> >> >>
> >> >> "When you look at the TCP pakets transmitted between server and client
> >> >> (tcpdump) you will see that the client sends a SYN and the server
> >> >> answers with a SYN/ACK. However the client will never receive the
> >> >> SYN/ACK. Therefore the client resends a SYN after exactly 3 seconds. The
> >> >> server answers with a SYN/ACK again and this time the client receives
> >> >> the paket. Now the connection is working."
> >> >>
> >> >>
> >> >
> >> > Hi!
> >> >
> >> > Can you please post that tcpdump? (Just to check the first ACK).
> >> >
> >> > Br H.Willstrand
> >> >
> >> >
> >> >> On friday I received an email from another user having the same problem.
> >> >> As I mentioned in my last email: we are not alone ;-)
> >> >>
> >> >> Many thanks for your help.
> >> >>
> >> >> Kind regards,
> >> >> Leo
> >> >>
> >> >>
> >> >>
> >> >> Marlon de Boer wrote:
> >> >> > Leo wrote:
> >> >> >
> >> >> >> We discovered the same problem two month ago (even 3 sec connect
> >> >> >> problems between webserver and mysql backend). I have posted it to
> >> >> >> the list (subject "TCP connect hangs for 3 seconds", 2008-02-04)
> >> >> >> but didn't get any answer yet. When I read your mail it was a
> >> >> >> flicker of hope.
> >> >> >>
> >> >> >> This is not a mysql issue! You can easily reproduce it with a few
> >> >> >> lines of C code (see below, it's similar to your program) running
> >> >> >> against any open port on the server (e.g. netcat: nc -k -l
> >> >> >> <PORTNUMBER>). We have also tested it on different hardware and
> >> >> >> kernel versions (up to 2.6.24.4) and we can deliver further
> >> >> >> information if requested ...
> >> >> >>
> >> >> >> Unfortunately the kernel parameter you mentioned in your second
> >> >> >> mail (net.core.somaxconn) didn't solve the problem.
> >> >> >>
> >> >> >> I think there are many other people out there having this problem
> >> >> >> unconsciously. Can anybody help?
> >> >> >>
> >> >> >>
> >> >> > I used the c code below to test the server side. After playing with
> >> >> > the kernel settings, some code,
> >> >> > setsockopt(sd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)) and
> >> >> > listen(sd, 8192) the timeouts disappeared.
> >> >> >
> >> >> > Did you changed the settings in mysql.cnf? back_log = 8192, and
> >> >> > restarted the mysql daemon? Else try to raise it to 16384 or even higher.
> >> >> >
> >> >> > Regards
> >> >> > Marlon
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >>
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-net" in
> >> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> >
> >> >
> >>
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-net" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html