[...]
: > mmap() does have significant overhead
:
: CPU overhead. The question is whether that is the bottleneck.
In the for what it is worth department, you are venturing into charted
waters. SGI has done all of these tricks and more for years and I think
we can learn from their experiences. A few points:
1) Your model above still does a copy. So the cold cache numbers can
never be faster than 1/4 of memory speed: DMA in, copy, DMA out.
The best numbers are 1/2 memory speed: DMA in, DMA out.
2) On SGI's, for server type of operations, the mmap() is the bottleneck.
You are setting up and tearing down a virtual mapping that you don't
need: the ``currency'' you are dealing in at both ends is physical
pages, not virtual pages. This starts to become a bottleneck for
files smaller than 8K (Linux) or 32K (most other operating systems).
Linux is better because it is lighter.
3) You can read my writeup of how I thought I/O ought to be done after
doing it at SGI for a while, it's in
ftp://ftp.bitmover.com/pub/splice.ps.gz (also splice.ps)
It's not what I'd call detailed, but it is interesting reading and I'd
love to discuss the idea with you at length. Stephen Tweedie and I
started to talk about it at Linux Expo; maybe he has some thoughts
he cares to share?
--lm
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu