Re: [00/17] Large Blocksize Support V3

From: David Chinner
Date: Fri Apr 27 2007 - 00:21:40 EST


On Thu, Apr 26, 2007 at 07:53:57PM -0700, Andrew Morton wrote:
> On Fri, 27 Apr 2007 12:27:31 +1000 David Chinner <dgc@xxxxxxx> wrote:
> > On Thu, Apr 26, 2007 at 07:04:38PM -0700, Andrew Morton wrote:
> > > On Tue, 24 Apr 2007 15:21:05 -0700 clameter@xxxxxxx wrote:
> > > Also, afaict your important requirements would be met by retaining
> > > PAGE_CACHE_SIZE=4k and simply ensuring that pagecache is populated by
> > > physically contiguous pages
> >
> > Sure, that addresses the larger I/O side of things, but it doesn't address
> > the large filesystem blocksize issues that can only be solved with some kind
> > of page aggregation abstraction.
>
> a) That wasn't a part of Christoph's original rationale list, so forgive
> me for thinking it is not so important and got snuck in post-facto when
> things got tough.

I've been pushing christoph to do something like this for more than a year
purely so we can support large block sizes in XFS. He's got other reasons
for wanting to do this, but that doesn't mean that the large filesystem
blocksize issue is any less important.

> blocksizes via this scheme - instantiate and lock four pages and go for
> it.

So now how do you get block aligned writeback? Or make sure that truncate
doesn't race on a partial *block* truncate? You basically have to
jump through nasty, nasty hoops, to handle corner cases that are introduced
because the generic code can no longer reliably lock out access to a
filesystem block.

Eventually you end up with something like fs/xfs/linux-2.6/xfs_buf.c and
doing everything inside the filesystem because it's the only way sane
way to serialise access to these aggregated structures. This is
the way XFS used to work in it's data path, and we all know how long
and loud people complained about that.....

A filesystem specific aggregation mechanism is not a palatable solution
here because it drives filesystems away from being able to use generic
code.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/