Re: Send-Q on UDP socket growing steadily - why?

From: Deomid Ryabkov
Date: Tue May 13 2008 - 15:03:44 EST


Denys Vlasenko wrote:
On Sunday 30 March 2008 07:43, Deomid Ryabkov wrote:
This has started recently and i'm at a loss as to why.
Send-Q on a moderately active UDP socket keeps growing steadily until it reaches ~128K (wmem_max?) at which point socket writes start failing.
The application in question is standard ntpd from Fedora 7, kernel is the latest available for the distro, that is
2.6.23.15-80.fc7 #1 SMP Sun Feb 10 16:52:18 EST 2008 x86_64

BIND, running on the same machine, does not exhibit this problem, but that may be because it does not get nearly as much load as ntpd,
which is part of the pool.ntp.org. That said, load is really not very high, on the order of 10 QPS, and machine is 99+% idle.
ntpd seems to be doing its usual select-recvmsg-sendto routine, nothing out of the ordinary.

Wher does it (tries to) send these packets?
all over the world :)

I managed to reproduced something like this if I try to send
UDPs to nonexistent host on local subnet. Kernel tries to find it,
it emits ARP probes but no reply is coming. As long as kernel
doesn't know how to send queued UDP packet, I see nonempty
queue.

However, in my simple case kernel decides that it is a lost case
in a few seconds, and drops packets (queue len 0).
ok, it happened again.
no, it's not arp - there are no <incomplete> entries in the arp table.

I imagine whit routing table tricks and/or iptables/arptables
you may end up with situation where kernel is stuck in
"I don't know how to send these packets" mode forever.
nothing fancy on this box - there are no firewall rules, except for on nat rule that does not apply to these packets.
You can strace ntpd, get a list of IPs it is trying to send packets
to, and then do "echo TEST | nc -u <ip> 123" for each of these.
will nc's queue become nonempty (at least for some IP)?
as far as i can tell, apart from this one socket networking on the box works normally.

this is what i see in netstat:

udp 0 125280 89.111.168.177:123 0.0.0.0:*

this is how strace looks like (nothing suspicious):

select(26, [16 17 18 19 20 21 22 23 24 25], NULL, NULL, {0, 382485}) = 1 (in [22], left {0, 125000})
select(26, [16 17 18 19 20 21 22 23 24 25], NULL, NULL, {0, 0}) = 1 (in [22], left {0, 0})
recvmsg(22, {msg_name(16)={sa_family=AF_INET, sin_port=htons(101), sin_addr=inet_addr("80.250.211.2")}, msg_iov(1)=[{"#\3\n\356\0\0\17v\0\0\25 \302!\277E\313\324`\257\317\3K\16\0\0\0\0\0\0\0\0"..., 1092}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 48
recvmsg(22, 0x7fffb14354a0, 0) = -1 EAGAIN (Resource temporarily unavailable)
sendto(22, "$\4\n\354\0\0\r\345\0\0!\31\n\0\0\2\313\324aR!\334\3242\313\324a\250\t\32\206\224"..., 48, 0, {sa_family=AF_INET, sin_port=htons(101), sin_addr=inet_addr("80.250.211.2")}, 16) = -1 EAGAIN (Resource temporarily unavailable)
select(26, [16 17 18 19 20 21 22 23 24 25], NULL, NULL, {0, 123523}) = 1 (in [22], left {0, 73000})
select(26, [16 17 18 19 20 21 22 23 24 25], NULL, NULL, {0, 0}) = 1 (in [22], left {0, 0})
recvmsg(22, {msg_name(16)={sa_family=AF_INET, sin_port=htons(123), sin_addr=inet_addr("217.77.53.12")}, msg_iov(1)=[{"\31\3\4\372\0\0\16\200\0\7\334F\301}\217\214\313\324R\4+g\\/\0\0\0\0\0\0\0\0"..., 1092}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 48
recvmsg(22, 0x7fffb14354a0, 0) = -1 EAGAIN (Resource temporarily unavailable)
sendto(22, "\31\4\4\354\0\0\r\345\0\0!\31\n\0\0\2\313\324aR!\334\3242\313\324a\236$\366\274\212"..., 48, 0, {sa_family=AF_INET, sin_port=htons(123), sin_addr=inet_addr("217.77.53.12")}, 16) = -1 EAGAIN (Resource temporarily unavailable)
select(26, [16 17 18 19 20 21 22 23 24 25], NULL, NULL, {0, 71771}) = 1 (in [22], left {0, 39000})
select(26, [16 17 18 19 20 21 22 23 24 25], NULL, NULL, {0, 0}) = 1 (in [22], left {0, 0})
recvmsg(22, {msg_name(16)={sa_family=AF_INET, sin_port=htons(29080), sin_addr=inet_addr("213.33.220.118")}, msg_iov(1)=[{"\331\3\4\372\0\0\7\v\0\2\6\262Yl|\4\313\324S\272\322\235e\326\313\324#\260\247\367\352\r"..., 1092}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 48
recvmsg(22, 0x7fffb14354a0, 0) = -1 EAGAIN (Resource temporarily unavailable)
sendto(22, "\31\4\4\354\0\0\r\345\0\0!\31\n\0\0\2\313\324aR!\334\3242\313\324a\247;\244\206\\"..., 48, 0, {sa_family=AF_INET, sin_port=htons(29080), sin_addr=inet_addr("213.33.220.118")}, 16) = -1 EAGAIN (Resource temporarily unavailable)

etc, etc, etc


--
vda


--
Deomid Ryabkov aka Rojer
myself@xxxxxxxxxxx
rojer@xxxxxxxxxxxx
ICQ: 8025844

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature