Re: Buffer cache hints

Linus Torvalds (torvalds@cs.helsinki.fi)
Sat, 7 Sep 1996 10:06:07 +0300 (EET DST)


On Sat, 7 Sep 1996, Richard Gooch wrote:
>
> I do use mmap() sometimes, but I still have to swap the bytes (I
> just use mmap() and bcopy() instead of read() to read from the
> file). It would be so nice if the data was in host-natural form, but
> alas, no.
> I have one file which is 30 MBytes, and my disc rattles like crazy for
> a few minutes before all the data has been read and swap-copied into
> VM. If it wasn't for the unneccesary paging, this would take 15 to 20
> seconds with my machine with 64 MBytes of RAM.

You'd still be better off with mmap + byte swap in place, than with read
+ byte swap. Rationale:

With "read(large-area)" + "massage(large-area)", you end up swapping things
out _twice_. When you do the read, the data in the beginning of the read
buffer gets swapped out when the kernel has to copy the data to the end of
the read buffer, and then when you do the byte-order stuff it has to be
swapped in again (and the end of the read buffer gets swapped out).

If you do a mmap(MAP_PRIVATE, PROT_READ|PROT_WRIE), the kernel won't actually
read the data until you need it, so it will be read just once, and then
directly massaged without hitting swap in between. The kernel will start
swapping out the (massaged) pages by the time you've reached the end, but
you'd still have "won" one swap-out.

Also, if you _know_ that you'll then use the data in some specific sequence,
you can try to minimize this swapping stage by doing the byte swap in
reverse: that way when you have byte-swapped all the data you're likely to
have the start of the data buffer in memory (because that's the part you
touched the latest). NOTE: this only makes sense if you know that the swap
is a problem, because generally it's slower going backwards than forward if
there are no swap effects.

You can do the same thing with read (read in small chunks and do the data
massage in small chunks), but it's generally easier with mmap. And it's a lot
more likely that you'll see a mmap cache hint in the future than a buffer
cache hint..

Linus