Re: Solaris 100K TCP connections, good example? was:[Fwd: [Fwd:

Chuck Lever (cel@monkey.org)
Tue, 28 Sep 1999 13:21:12 -0400 (EDT)


thanks for your response!

On Tue, 28 Sep 1999, David S. Miller wrote:
> 1) Some sockets do need a bonafide inode attached to them,
> for example UNIX domain sockets bound to a name in the filesystem
> space.

agreed -- wouldn't want to change that.

> 2) In my experiments over a year ago, I do recall one statistic, that
> even with all of this overhead the kernel could manage ~40,000
> full sockets with ~64MB of ram. This was on a 64-bit system.

i still have trouble on ia32 class systems creating as many as 25K
sockets, for instance, on my 64M laptop... that is the source of my
initial concern. i was able to easily create 64K open file descriptors.

btw this is on 2.3.17; haven't tried on 2.2 kernels.

> 3) Most "high end, high connection rate" systems are not full of
> established state full sockets, no, but rather they are consumed
> mostly by sockets in TIME_WAIT. This case we have in fact
> optimized, we throw away the inode/dentry/file when we enter
> this state and even use a miniature version of "struct sock" so
> we only keep around the pieces of information we need. This
> structure is about 120 bytes on a 64-bit machine last time I
> checked.
>
> #3, in my experience, is by far the greatest thing to plane down
> for top-notch TCP connection rates and performance. In fact, work
> on this continues, we have experimental TIME_WAIT recycling code
> in 2.3.x which attempts to safely kill TIME_WAIT state sockets
> early. There are some final details to attend to, because if we
> do this we must do it correctly, so there is a chance it may not
> be finished and enabled in time for the final 2.4.x

that sounds like a very good idea. but i'm still concerned about these
issues, wrt socket management scalability.

1. inodes can never be returned to the free page pool. andrea has a nice
patch that can help here.

2. once there are more than max_inodes (which can't be more than 16834 in
the current kernel), allocating a new inode means garbage-collecting the
entire inode cache. andrea's patch also addresses this problem.

3. i was wrong to state that these inodes are hashed; get_empty_inode()
can't hash these, because, as alex v. pointed out, their i_sb field is
NULL. they are placed in the in_use list, however, which means walking
this list can get expensive (like in free_inodes).

4. creating a socket is slow, much slower than simply opening a file
descriptor, and it is complicated. perhaps socket creation could be more
lazy, or more efficient. the only thing that the socket code ever uses
the dentry for, for instance, is to find the inode, so maybe the dentry
stuff could be eliminated for network sockets, and inode lookup could
happen in some other way. even more radical might be to adopt your simpler
socket struct scheme for the entire lifetime of a network socket.

i'm happy to help with any of this.

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/linux-scalability/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/