Re: [PATCH] Re: Move of input drivers, some word needed from you

From: tytso@valinux.com
Date: Tue Aug 22 2000 - 10:21:09 EST


   From: David Woodhouse <dwmw2@infradead.org>

   The problem is that if you start to decouple the chipset driver from the
   code which knows how to access the chip, you end up with lots of horrible
   indirect function calls in the inner loops. This isn't really going to help
   improve performance - and the serial driver has one of the biggest problems
   w.r.t latency already.

The serial driver uses inline functions with explicit case statements,
and before you condemn such tactics, I suggest you actually benchmark
things. As long as everything is in cache (as it is for any tight
loops), given modern CPU's with call stacks, even indirect function
calls are pretty much cheap compared to the time it takes for characters
to arrive. Certainly the amount of time it takes to execute the case
statement is very, very, small compared to the time it takes to do any
kind of serial I/O operation. (Remember, CPU's have been getting faster
and faster; serial speeds have stayed pretty much the same, or have at
most doubled or quadroupled over the past decade.)

Jim Gettys has a wonderful explanation of this effect in the X server.
It turns out that with branch predictions and the relative speed of CPU
vs. memory changing over the past decade, loop unrolling is pretty much
pointless. In fact, by eliminating all instances of Duff's Device from
the XFree86 4.0 server, the server shrunk in size by _half_ _a_
_megabyte_ (!!!), and was faster to boot, because the elimination of all
that excess code meant that the X server wasn't thrashing the cache
lines as much.

The bottom line is that our intuitive assumptions of what's fast and
what isn't can often be wrong, especially given how much CPU's have
changed over the past couple of years. For example, Rusty and I have
talked about this issue, and we have a sneaking suspicion that many of
the inline functions (yes, including in the serial driver) can and
should go away, and be turned into normal functions, for similar
reasons. Essentially, compared to cache misses, you can execute a
*large* number of instructions for "free".

                                                - Ted

P.S. The latency problems of the serial driver are completely unrelated
to this issue. They are caused by (a) IDE (and other drivers) masking
interrupts for long periods of time, and (b) the serial driver batching
characters and only calling the line discpline every clock tick. This
behaviour can be turned off by using the command "setserial /dev/ttyS0
low_latency", which will increase the CPU overhead ---- on an 8250, if
you are receiving characters at 115200 bps, the line discpline code will
be called 11,520 times a second, instead of 100 times a second. But,
you will have a very low-latency driver. Of course, if you're reading
characters in cooked mode, this CPU time is basically completely
wasted.

Given that there are very few applications where latency actually
matters --- a well designed protocol like kermit, zmodem, or TCP use
windowing to avoid lockstep performance issues --- we ASYNC_LOW_LATENCY
isn't turned on by default.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Aug 23 2000 - 21:00:07 EST