Re: A buffer/page cache question

Peter Moulder (reiter@netspace.net.au)
Wed, 29 Oct 1997 10:31:01 +1100 (EST)


(Note: I'm talking off the top of my head here, with no access to
sources. Also I'm not intimately involved with kernel development anyway.)

On Tue, 28 Oct 1997, Richard S. Gray wrote:

> 1. Is the cache divided into the five subcomponets listed below?
> a.) Buffer Cache
> b.) Page Cache
> c.) Swap Cache
> d.) Inode Cache
> e.) Directory Cache

They are different types of cache, yes. (Usually "the cache" refers to
the page cache.) I don't know anything about swap cache, or what's meant
by it.

> 2. What's bothering me is that can't seem to track the flow of data
> through the various caches. I'm primarily concerned with the Buffer,
> Page and Swap caches. I've been told that only file system meta-data
> goes through the Buffer Cache. I'm not saying that this isn't correct.
> What I am saying is that I don't understand why. Doesn't this mean that
> if I wanted to read a single data base record, for instance, that I
> would be required to read an entire 4 kilobyte page just to get to that
> record?

yes.

> 3. How does data get into the Page Cache? Does the Page Cache
> directly interface with the device driver? If the above assertion that
> only fs meta-data goes through the Buffer Cache is true then that would
> imply that the Page Cache does directly interface with the device
> driver. If such an interface exists is it found in the file_operations
> structure associated with the appropriate device_struct that is in turn
> found in the blkdevs vector?

Read the readpage operation in one of the filesystems. I have only looked
at ext2. all I/O with block devices goes through buffers, but for
readpage, "temporary buffers" are used. (They're discarded as soon as we
have the data in the page cache.) The I/O request goes to a list which is
serviced by a block device driver.

> 4. I've been told and believe that the Page Cache is being used as a
> read ahead cache. This implies that if a requested page is read into
> memory from disk then subsequently several additional pages will be read
> in as well.

This is true for sequential file I/O, but not (at present) for mmap.

> I think the number of pages to prefetched can be adjusted
> thought mlord's "hdparm" utility.

I don't know about this, but it certainly adjusts as more of the file is
read in.

> Is this why when a page fault is
> generated and the Physical Page Frame Number associated with the
> appropriate Page Table Entry is equal to zero you check to ensure that
> the requested page has not been prefetched into the Page Cache.

Don't know much about mm in linux, but I do know that when we need data
from a certain offset in a file, we check to see if the page is already
in RAM by looking in a hash table (hash of inode pointer and offset, I
believe).

Perhaps this doesn't answer your question.

> As I
> understand this, we're saying "I've prefeched so many pages and those
> pages will remain in the cache until the number of free pages within the
> system reaches a predefined minimum." Once there are less then the
> predefined minimum number of free pages in the system, swapped (the
> kernel swap daemon) attempts to shrink the Buffer cache and the Page
> Cache to obtain the required number of free pages.

This is pretty much my understanding. Buffers can be freed from memory
earlier than that if they have served their purpose (e.g. transport
to/from page cache). The swap daemon is called kswapd; I don't really
know what that does. Most of the code for freeing up memory is not
directly associated with kswapd, though.

> I wonder if there
> is a way to prevent swapped from shrinking the Buffer/Page cache below a
> predefined specified limit of say 10 megs.

I doubt it. If someone needs memory, they need memory; there's usually
not much point returning ENOMEM to that someone just because you want I/O
to be fast. As you say above, we already keep about as much buffer/page
cache as we can anyway, not bothering to remove pages until someone does
request more memory.

> 5. If you explain anything please try describe how data moves from/to
> the device driver from/to the OS.

For writing, the user data is copied to a buffer, and then some vm
routine is called to update the page cache. See a write operation, e.g.
ext2_file_write().

> I understand that the Buffer Cache
> make uses of a shared queue. Block devices read/write requests are
> placed on this queue. The device driver then services the request and
> removes it from the queue.

This is covered in the Kernel Hackers' Guide, http://www.redhat.com:8001/
(is that port right?).

> I also understand that the file_operations
> structure associated with the block device special file can be used to
> to request that data be sent to/from the block device. What I'm not
> sure of is if the Page Cache interfaces directly with the device driver.
> If the Page Cache does interface with the device driver than what
> mechanisms are used?

I believe that if you cat /dev/some-block-driver then it's much the same
as if you cat some-regular-file, i.e. you access the page cache rather
than buffers. This is so that you can mmap the device. There should be
a readpage and writepage operation associated with that device (usually
left as NULL, I think, as the usual inteface with drivers is just through
the block request stuff. I'm not too sure on this para, but the khg
covers some/all of this.

pjm.