Re: Question about select and poll system call

From: richard clark
Date: Fri Mar 17 2023 - 04:29:35 EST


I had to confess I've got *almost* the similar consideration after a
long dedicated thinking before seeing this, so it's one of the
greatest decisions we can make together. A very nice and patient
explanation, and happy weekend, good guy:). Please feel free to raise
your different options for anyone watching this...

Anyway, some comments inline...

On Fri, Mar 17, 2023 at 2:15 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Mar 13, 2023 at 7:28 PM richard clark
> <richard.xnu.clark@xxxxxxxxx> wrote:
> >
> > There're two questions about these system calls:
> > 1. According to https://pubs.opengroup.org/onlinepubs/7908799/xsh/select.html:
> > ERRORS
> > [EINVAL]
> > The nfds argument is less than 0 or greater than FD_SETSIZE.
> > But the current implementation in Linux like:
> > if (nfds > FD_SETSIZE)
> > nfds = FD_SETSIZE
> > What's the rationale behind this?
>
> Basically, the value of FD_SETSIZE has changed, and different pieces
> of the system have used different values over the years.
>
> The exact value of FD_SETSIZE ends up actually depending on the
> compile-time size of the "fd_set" variable, and both the kernel and
> glibc (and presumably other C library implementations) have changed
> over time.
>
> Just to give you a flavor of that history, 'select()' was implemented
> back in early '92 in linux-0.12 (one of the greatest Linux releases of
> all time - 0.12 was when Linux actually became *useful* to some
> people).
>
> And back then, we had this:
>
> typedef unsigned long fd_set;
>
> which may seem a bit limiting today ("Only 32 bits??!?"), but to put
> that in perspective, back then we also had this:
>
> #define NR_OPEN 20
>
> and Linux-0.12 also did the *radical* change of changing NR_INODE from
> 32 to 64. Whee..
>
> It was a very different time, in other words.
>
> Now, imagine what happens when you increase those kinds of limits (as
> we obviously did), and you do the library and kernel maintenance
> separately. Some people might use a newer library with an older
> kernel, and vice versa.
>
> Doing that
>
> if (nfds > FD_SETSIZE)
> nfds = FD_SETSIZE;
>
> basically allows you to at least limp along in that situation, where
> maybe the library uses a 'fd_set' with thousands of bits, but the
> kernel has a smaller limit.
>
> Because you *will* find user programs that basically do
>
> select(FD_SETSIZE, ...)
>
> even if they don't actually use all those bits. Returning an error
> because the C library had a different idea of how big the fdset was
> compared to the kernel would be bad.
>
> Now, the above is the *historical* reason for this all. The kernel
> hasn't actually changed FD_SETSIZE in decades. We could say "by now,
> if you use FD_SETSIZE larger than 1024, we'll return an error instead
> of just truncating it".
>
> But at the same time, while time has passed and we could do those
> kinds of decisions, by now the POSIX spec is almost immaterial, and
> compatibility with older versions of Linux is more important than
> POSIX paper compatibility.
>
> So there just isn't any reason to change any more.
>
> > 2. Can we unify the two different system calls? For example, using
> > poll(...) to implement the frontend select call(...), is there
> > something I'm missing for current implementation?
>
> No. select() and poll() are completely different animals. Trying to
> unify them means having to convert from an array of fd descriptors to
> several arrays of bits. They are just very different interfaces.

Technically, this kind of conversion is not as radical as thought(even
I think the performance pain can be ignored), the pros. is the
maintainer needs to care about only one piece of code. Actually the
unified implementation of the fd->poll(...) can be seen as obvious
evidence, essentially the core is the same but with different skin, at
least this is weak to justify current implementation.

>
> Inside the kernel, the low-level implementation as far as individual
> file descriptors is concerned is all unified already. Once you just
> deal with one single file descriptor, we internally use a "->poll()"
> thing. But to *get* to that individual file descriptor, select() and
> poll() walk very different data structures.
>
> Linus