Re: [patch 00/14] Page cache cleanup in anticipation of LargeBlocksize support

From: Andrew Morton
Date: Thu Jun 14 2007 - 18:50:04 EST

> On Thu, 14 Jun 2007 15:22:46 -0700 (PDT) Christoph Lameter <clameter@xxxxxxx> wrote:
> On Thu, 14 Jun 2007, Andrew Morton wrote:
> > With 64k pagesize the amount of memory required to hold a kernel tree (say)
> > will go from 270MB to 1400MB. This is not an optimisation.
> I do not think that the 100% users will do kernel compiles all day like
> we do. We likely would prefer 4k page size for our small text files.

There are many, many applications which use small files.

> > Several 64k pagesize people have already spent time looking at various
> > tail-packing schemes to get around this serious problem. And that's on
> > _server_ class machines. Large ones. I don't think
> > laptop/desktop/samll-server machines would want to go anywhere near this.
> I never understood the point of that exercise. If you have variable page
> size then the 64k page size can be used specific to files that benefit
> from it. Typically usage scenarios are video audio streaming I/O, large
> picture files, large documents with embedded images. These are the major
> usage scenarioes today and we suck the. Our DVD/CD subsystems are
> currently not capable of directly reading from these devices into the page
> cache since they do not do I/O in 4k chunks.

So with sufficient magical kernel heuristics or operator intervention, some
people will gain some benefit from 64k pagesize. Most people with most
workloads will remain where they are: shoving zillions of physically
discontiguous pages into fixed-size sg lists.

Whereas with contig-pagecache, all users on all machines with all workloads
will benefit from the improved merging.

> > > fsck times etc etc are becoming an issue for desktop
> > > systems
> >
> > I don't see what fsck has to do with it.
> >
> > fsck is single-threaded (hence no locking issues) and operates against the
> > blockdev pagecache and does a _lot_ of small reads (indirect blocks,
> > especially). If the memory consumption for each 4k read jumps to 64k, fsck
> > is likely to slow down due to performing a lot more additional IO and due
> > to entering page reclaim much earlier.
> Every 64k block contains more information and the number of pages managed
> is reduced by a factor of 16. Less seeks , less tlb pressure , less reads,
> more cpu cache and cpu cache prefetch friendly behavior.

argh. Everything you say is just wrong. A fsck involves zillions of
discontiguous small reads. It is largely seek-bound, so there is no
benefit to be had here. Your proposed change will introduce regressions by
causing larger amounts of physical reading and large amounts of memory

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at