Optimising poll(2)

Richard Gooch (rgooch@atnf.CSIRO.AU)
Sat, 23 Aug 1997 18:25:30 +1000


Hi, all. I've been doing some more thinking about optimising the
poll() syscall (these ideas would also apply to the proposed poll2()
syscall). In a previous message I reported that it takes 2.9
milliseconds to do a zero timeout poll() for 1021 file descriptors on
a Pentium 100 (this is with the recent patch to poll() I posted to
avoid wait table manipulation).

I think that it is reasonable to bring this down to around 1
millisecond.

I think it is important to bring down the time taken for the zero
timeout case, since poll() effectively does a zero timeout poll when
the process is woken up (to check which fds have activity): so this
code path is taken often.

What I propose would change the way polling is done in the kernel,
which of course involves a fair bit of work, but is hopefully worth
it. The basic idea is to define a new field (called, say,
"poll_events") inside the "struct file" structure which contains a
poll event mask, just like what you get back from poll(). It would be
the responsibility of each driver to add bits like POLLIN when data is
available for reading and POLLOUT when data can be written without
blocking, and of course all the other defined bits.
Defining this new field would then allow poll() to process these
fields directly, avoiding calling the indirect poll function in the
file operations structure and whatever that function may in turn
call. This would yield a drastic reduction in the time taken for
poll() to scan the list of fds for activity.

We could take this further and avoid the need for a poll function in
the file operations structure altogether by defining some wait queue
fields in the "struct file" structure. poll() would add the process to
the appropriate wait queue itself, and each driver would schedule a
wakeup *after* it has updated the "poll_events" field. Voila! No more
need for driver-specific poll functions. It would provide a faster and
(IMHO) a simpler interface for poll() support.

I *think* what I've proposed would work, but if I've missed something
fundamental, please enlighten me. I realise that this proposal would
require changes in many drivers (affecting a lot of people), so it
will require quite some effort, but I'd like to know if people think
the basic principle is sound?

Regards,

Richard....