Re: Question about select and poll system call

From: Linus Torvalds
Date: Thu Mar 16 2023 - 14:15:56 EST


On Mon, Mar 13, 2023 at 7:28 PM richard clark
<richard.xnu.clark@xxxxxxxxx> wrote:
>
> There're two questions about these system calls:
> 1. According to https://pubs.opengroup.org/onlinepubs/7908799/xsh/select.html:
> ERRORS
> [EINVAL]
> The nfds argument is less than 0 or greater than FD_SETSIZE.
> But the current implementation in Linux like:
> if (nfds > FD_SETSIZE)
> nfds = FD_SETSIZE
> What's the rationale behind this?

Basically, the value of FD_SETSIZE has changed, and different pieces
of the system have used different values over the years.

The exact value of FD_SETSIZE ends up actually depending on the
compile-time size of the "fd_set" variable, and both the kernel and
glibc (and presumably other C library implementations) have changed
over time.

Just to give you a flavor of that history, 'select()' was implemented
back in early '92 in linux-0.12 (one of the greatest Linux releases of
all time - 0.12 was when Linux actually became *useful* to some
people).

And back then, we had this:

typedef unsigned long fd_set;

which may seem a bit limiting today ("Only 32 bits??!?"), but to put
that in perspective, back then we also had this:

#define NR_OPEN 20

and Linux-0.12 also did the *radical* change of changing NR_INODE from
32 to 64. Whee..

It was a very different time, in other words.

Now, imagine what happens when you increase those kinds of limits (as
we obviously did), and you do the library and kernel maintenance
separately. Some people might use a newer library with an older
kernel, and vice versa.

Doing that

if (nfds > FD_SETSIZE)
nfds = FD_SETSIZE;

basically allows you to at least limp along in that situation, where
maybe the library uses a 'fd_set' with thousands of bits, but the
kernel has a smaller limit.

Because you *will* find user programs that basically do

select(FD_SETSIZE, ...)

even if they don't actually use all those bits. Returning an error
because the C library had a different idea of how big the fdset was
compared to the kernel would be bad.

Now, the above is the *historical* reason for this all. The kernel
hasn't actually changed FD_SETSIZE in decades. We could say "by now,
if you use FD_SETSIZE larger than 1024, we'll return an error instead
of just truncating it".

But at the same time, while time has passed and we could do those
kinds of decisions, by now the POSIX spec is almost immaterial, and
compatibility with older versions of Linux is more important than
POSIX paper compatibility.

So there just isn't any reason to change any more.

> 2. Can we unify the two different system calls? For example, using
> poll(...) to implement the frontend select call(...), is there
> something I'm missing for current implementation?

No. select() and poll() are completely different animals. Trying to
unify them means having to convert from an array of fd descriptors to
several arrays of bits. They are just very different interfaces.

Inside the kernel, the low-level implementation as far as individual
file descriptors is concerned is all unified already. Once you just
deal with one single file descriptor, we internally use a "->poll()"
thing. But to *get* to that individual file descriptor, select() and
poll() walk very different data structures.

Linus