Re: [patch 00/14] Page cache cleanup in anticipation of LargeBlocksize support

From: Andrew Morton
Date: Thu Jun 14 2007 - 18:50:04 EST


> On Thu, 14 Jun 2007 15:22:46 -0700 (PDT) Christoph Lameter <clameter@xxxxxxx> wrote:
> On Thu, 14 Jun 2007, Andrew Morton wrote:
>
> > With 64k pagesize the amount of memory required to hold a kernel tree (say)
> > will go from 270MB to 1400MB. This is not an optimisation.
>
> I do not think that the 100% users will do kernel compiles all day like
> we do. We likely would prefer 4k page size for our small text files.

There are many, many applications which use small files.

> > Several 64k pagesize people have already spent time looking at various
> > tail-packing schemes to get around this serious problem. And that's on
> > _server_ class machines. Large ones. I don't think
> > laptop/desktop/samll-server machines would want to go anywhere near this.
>
> I never understood the point of that exercise. If you have variable page
> size then the 64k page size can be used specific to files that benefit
> from it. Typically usage scenarios are video audio streaming I/O, large
> picture files, large documents with embedded images. These are the major
> usage scenarioes today and we suck the. Our DVD/CD subsystems are
> currently not capable of directly reading from these devices into the page
> cache since they do not do I/O in 4k chunks.

So with sufficient magical kernel heuristics or operator intervention, some
people will gain some benefit from 64k pagesize. Most people with most
workloads will remain where they are: shoving zillions of physically
discontiguous pages into fixed-size sg lists.

Whereas with contig-pagecache, all users on all machines with all workloads
will benefit from the improved merging.

> > > fsck times etc etc are becoming an issue for desktop
> > > systems
> >
> > I don't see what fsck has to do with it.
> >
> > fsck is single-threaded (hence no locking issues) and operates against the
> > blockdev pagecache and does a _lot_ of small reads (indirect blocks,
> > especially). If the memory consumption for each 4k read jumps to 64k, fsck
> > is likely to slow down due to performing a lot more additional IO and due
> > to entering page reclaim much earlier.
>
> Every 64k block contains more information and the number of pages managed
> is reduced by a factor of 16. Less seeks , less tlb pressure , less reads,
> more cpu cache and cpu cache prefetch friendly behavior.

argh. Everything you say is just wrong. A fsck involves zillions of
discontiguous small reads. It is largely seek-bound, so there is no
benefit to be had here. Your proposed change will introduce regressions by
causing larger amounts of physical reading and large amounts of memory
consumption.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/