Re: > 15,000 Simultaneous Connections

Steve Underwood (steveu@netpage.com.hk)
Mon, 06 Sep 1999 08:16:37 +0000


Ed Hall wrote:

> Lincoln Dale wrote:
> > in order to be able to handle large numbers of concurrent connections (open
> > sockets), there are really only two ways of doing it efficiently:
> >
> > (1) single process using threads
> > (2) single process, event-driven queues
> >
> > with (1), you have a slightly higher overhead than a well-written (2),
> > however you can scale to multiprocessor with no additional effort.
> >
> > (2) will always be best, but is typically one of the hardest to code for.
> >
>
> Whether (2) is best or not depends upon the kernel interface. If a
> scan of 15,000 file descriptors is required for each event (such as
> a straightforward design based on select()) then (2) could be nearly
> as bad as a naive accept()/fork() model, while a thread-per-connection
> model might scale better. If the kernel somehow passed an event list
> to the server process more efficiently than via select() or poll(),
> then I would agree that (2) is probably always the better way, perhaps
> with extensions into a thread-per-processor regime on SMP's.
>
> Achieving the best efficiency involves coupling the arrival of a packet
> to the object (explicit or implicit) handling that particular connection
> in the most direct way possible, modulo buffering. Eliminating linear
> searches from that coupling is important--a scan of 15,000 descriptors
> is going to be a significant hit, no matter what other efficiencies are
> involved, since it is O(n^2) (where n is the number of connections). This
> is why a threaded model might work better for large numbers of connections,
> assuming that thread scheduling is better than O(n^2). (Is it?)

So, the solution is to fix the clunky design of select, right? I'm not convinced
that dealing with a huge number of threads is any better than scanning a huge
select, anyway. Especially on a processor like the x86, which has scanning
instructions that can be used to scan the list quite quickly.

> I've actually implemented high-efficiency servers based on a combination
> of (1) and (2); in essence, a small pool of threads each handling a number
> of connections using an event-based model. At least for my project (which
> involved high connection rates more than large numbers of sustained
> connections), one or two threads per processor seemed to be the most
> efficient configuration--but this was under Solaris, not Linux, though
> I certainly was able to achieve good performance (>1000 connects/second
> on a UP P2/333) on the latter in tests.

Of course, using a select based approach on a multi-processor machine requires
some sort of hybrid like this. Could it be that it works well on a single
processor machine because it avoids either a huge select, or a crazy number of
threads? I think the balancing effect of this type of architecture is why it
seems to works well. I still think fixing the clunky design of select is a good
idea, though.

Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/