Re: Timings for optimised poll(2)

Rogier Wolff (R.E.Wolff@BitWizard.nl)
Sun, 24 Aug 1997 09:33:48 +0200 (MET DST)


Richard Gooch wrote:
> There are a few reasons why you would not use the find_first_zero_bit
> function in a real-world application:
>
> 1) it assumes you are using select(2) and fd_sets. This has the old
> problem of hard-wired limits on the fd_set size. Today the limit is
> 1024. Tomorrow somebody wants more, so you have to recompile your
> kernel. If you want to manage a large number of descriptors, you would
> use poll(2) instead

The kernel has hard limits on the number of FDs per process, so
increasing the limit requires a recompilation of the kernel.

If you want to be "portable" across different limits, use
fd = fopen ("/proc/kernel/file-max","r");
fscanf (fd,"%d", &file_max);
close (fd);

fd_set = malloc (file_max / 8);

to allocate your fd_sets.

> 2) every time poll(2) or select(2) returns, you may get dozens of
> descriptors which are ready. How do you solve this? Call
> find_first_zero_bit for each active descriptor (clearing as you go)?

No. You use "find_next_zero_bit".

You have to rewrite the little assembly thingy anyway, because you want
to find the first_one_bit, which is left as an excercise to the reader.

> I've appended my latest version test programme. Compile with -O2. This
> contains a *realistic* scanning loop. It takes 1.5 milliseconds on a
> Pentium 100.

It takes 475 us on my pentium Pro. I can save 75 us (7 clocks/loop)
by not keeping a copy of the cbk_ptr uptodate. (cbk_ptr = cbk_array +
count; whenever you've found one). What actually happens is that gcc
recognizes it, and rewrites it again to keep an uptodate cbk_ptr. It
does this more efficiently than when you put the code there in the
first place.... Odd right?

I can save another 130 us by not keeping track of the "num_ready"
variable. If you expect several active fd's in one run, that is not
going to buy you anything anyway.

My previous program, and my modified version of yours both run around
the memory bandwidth of my computer. So storing 1 bit of information in
a 32 bit word is wasteful, and you can expect a 30 times performance
decrease if you do that...... As the pollfd structure is even three
times larger than that, because of cache issues, I'd actually expect
an even larger performance decrease..... Maybe that first_free_xx_bit
routine can still handle a little optimizing..... :-)

I think I saw you claim 1000 filedescriptors in 2.5 ms. It seems that
this was probably 10000. So My find_first_zero_bit hack is only 1e4
faster. Bummer. :-)

Regards,

Roger.