Understanding buffers / buffer cache

From: Martin Steigerwald
Date: Thu Apr 14 2011 - 12:24:07 EST


Please keep either linux-kernel or my address as cc, as I am only subscribed
to linux-kernel, not linux-mm.


Hi!

In this weeks Linux performance analysis and tuning course that I hold there
have been detailed questions about what the Linux kernel uses the memory for
that free displays under "buffers".

I know as much:

- it is for buffers that have to be written to disk at some time (opposed to
caches which are for reads)

- it is somewhat related to pdflush / flush-major:minor threads, XFS doesn't use
these (but uses xfsbufd / xfsyncd) instead

- observation is, that it doesn't increase much on a simple dd, but does
increase much more on a tar -xf linux-x.y.tar.gz (after a echo 3 >
/proc/sys/vm/drop_caches)

- the data to be written via dd instead displays with Dirty: and then
Writeback and /proc/meminfo


Thus I thought buffers were mainly related to metadata stuff.


But one course member (on cc) digged into the kernel source and found it with:

- fs/block_dev.c:

- long nr_blockdev_pages(void)
{
struct block_device *bdev;
long ret = 0;
spin_lock(&bdev_lock);
list_for_each_entry(bdev, &all_bdevs, bd_list) {
ret += bdev->bd_inode->i_mapping->nrpages;
}
spin_unlock(&bdev_lock);
return ret;
}

- include/fs.h:

struct block_device {
dev_t bd_dev; /* not a kdev_t - it's a search key
*/
struct inode * bd_inode; /* will die */

[...]

struct inode {
/* RCU path lookup touches following: */
[...]
struct address_space *i_mapping;


- And then this in lots of places:

martin@shambhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> find -name
"*.c" -or -name "*.h" | xargs grep i_mapping
./include/linux/fs.h: struct address_space *i_mapping;
./include/linux/fs.h: invalidate_mapping_pages(inode->i_mapping, 0,
-1);
./include/trace/events/ext4.h: __entry->writeback_index = inode-
>i_mapping->writeback_index;
./include/trace/events/ext4.h: __entry->writeback_index = inode-
>i_mapping->writeback_index;
./kernel/cgroup.c: inode->i_mapping->backing_dev_info =
&cgroup_backing_dev_info;
./arch/powerpc/platforms/cell/spufs/file.c: ctx->local_store =
inode->i_mapping;
./arch/powerpc/platforms/cell/spufs/file.c: ctx->cntl = inode-
>i_mapping;
[...]
./arch/tile/kernel/smp.c:static unsigned long __iomem *ipi_mappings[NR_CPUS];
./arch/tile/kernel/smp.c: ipi_mappings[cpu] =
ioremap_prot(offset, PAGE_SIZE, pte);
./arch/tile/kernel/smp.c: ((unsigned long __force *)ipi_mappings[cpu])
[IRQ_RESCHEDULE] = 0;
[...]

including various filesystems where it seems to be used related to metadata
*and* file I/O as well as "journal" / cow I/O. For example:

./fs/btrfs/inode.c: page = find_get_page(inode->i_mapping,
./fs/btrfs/inode.c: inode->i_mapping,
start,
./fs/btrfs/inode.c: inode->i_mapping->a_ops = &btrfs_aops;
./fs/btrfs/inode.c: inode->i_mapping->backing_dev_info = &root-
>fs_info->bdi;
[...]
./fs/btrfs/ordered-data.c: !mapping_tagged(inode->i_mapping,
PAGECACHE_TAG_DIRTY)) {
./fs/btrfs/ordered-data.c: filemap_flush(inode-
>i_mapping);
./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode-
>i_mapping, start, end);
./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode->i_mapping,
start, orig_end);
./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode->i_mapping,
start, orig_end);
./fs/btrfs/ordered-data.c: filemap_fdatawait_range(inode->i_mapping,
start, orig_end);
[...]
./fs/btrfs/file.c: pages[i] = grab_cache_page(inode->i_mapping,
index + i);
./fs/btrfs/file.c: current->backing_dev_info = inode->i_mapping-
>backing_dev_info;
./fs/btrfs/file.c: filemap_fdatawrite_range(inode-
>i_mapping, pos,
./fs/btrfs/file.c: inode-
>i_mapping,
./fs/btrfs/file.c: invalidate_mapping_pages(inode-
>i_mapping,
./fs/btrfs/file.c: filemap_flush(inode->i_mapping);




So what exactly are buffers used for? Is there any up-to-date and detailed
documentation or howto or explaination available? Most hits I found on search
engine are either quite short and vague or relate to really old kernel
versions.

Is there any detailed explaination available on how - as in which steps - the
Linux kernel writes certain kinds of data like

- inode / metadata traffic
- dirty pages (ok, via pdlush / flush, as long as one process doesn't overuse
it)
- I/O from processes by using system functions like write()
- direct i/o

Or do you have any hints on what source files to read in order to understand
more regarding these questions?

Thanks,
--
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90

Attachment: signature.asc
Description: This is a digitally signed message part.