IPv6: Connection reset/timeout under heavy load

From: Agoston Horvath
Date: Tue Mar 27 2007 - 06:05:54 EST


Hello,

I'm trying to add ipv6 support to the RIPE whois-server. I'm going with the dual-stack address familiy independent solution (/proc/sys/net/ipv6/bindv6only is set to 0).
I've written a small piece of code which tests how the server behaves under heavy load. It basically creates a specified number of threads and hammers the query port of the server with logged real-world queries. This works fine over ipv4. But if I switch to ipv6 (meaning, instead of the ipv4 address of eth0, I give the ipv6 one to getaddrinfo()), strange connection reset/timeout problems arise. This is especially strange because it's all AF-independent, so the very same piece of code runs fine for ipv4, but fails for ipv6.

I've created a small excerpt of the connection code (see attachment), which can be used to reproduce the problem. I've played around with number of clients/servers and the number of threads, and came to the conclusion that the problem is most probably in either in libc or in the kernel.
Strangest thing is that if I set the client to use 1 thread only, it works well. If I either start 2 pieces of 1-threaded clients or 1 piece of 2-threaded client, they start throwing errors.

I've even fiddled with sysctls tcp_fin_timeout, tcp_tw_recycle and tcp_tw_reuse, but it didn't change anything. Setting SO_LINGER to a smaller value also didn't help.

I'm using kernel 2.6.16.29, libc is 2.3.2.ds1-22sarge5. However, same result was achieved using kernel version 2.6.13 on the same box.

I'm out of further ideas. If anyone can give me some pointers about what could be wrong, please, don't spare me.

Thanks,
Agoston

Attachment: load_test.tar.gz
Description: GNU Zip compressed data