Re: patch: improve generic_file_buffered_write() (2nd try 1/2)

From: Nick Piggin
Date: Fri Sep 07 2007 - 17:34:31 EST


On Saturday 08 September 2007 17:25, Nick Piggin wrote:
> On Saturday 08 September 2007 07:12, Goswin von Brederlow wrote:
> > Nick Piggin <nickpiggin@xxxxxxxxxxxx> writes:
> > > On Saturday 08 September 2007 06:01, Goswin von Brederlow wrote:
> > >> b) a segment boundary
> > >
> > > This is done, as I said, because of the deadlock issue. While the issue
> > > is more completely fixed in -mm, a special case for kernel memory (eg.
> > > nfsd) is in the latest mainline kernels.
> >
> > Can you tell me where to get the fix from -mm? If it is completly
> > fixed there then that could make our patch obsolete.
>
> In the latest -mm series file, they start at
> mm-revert-kernel_ds-buffered-write-optimisation.patch
> ...
> and go to
> ocfs2-convert-to-new-aops.patch
>
> > >> What actually locks the page? Is it __grab_cache_page or
> > >> a_ops->prepare_write?
> > >
> > > prepare_write must be given a locked page.
> >
> > Then that means __grab_cache_page does return a locked page because
> > there is nothing between the two calls that would.
>
> That's right.
>
> > > No it would be included earlier. The "segment_eq" check should be
> > > allowing kernel writes (nfsd) to write multiple segments. If you have a
> > > patch which changes this significantly, then it would indicate the
> > > existing logic has a problem (or you've got a userspace application
> > > doing the writev, which should be fixed by the write_begin patches in
> > > -mm).
> >
> > I've got userspace application doing the writev. To be exact 14% of
> > the commits were saved by combining multiple segments into a single
> > prepare/write pair. Since the kernel segments don't fragment anymore
> > in 2.6.23-rc5 those savings must come from user space stuff.
> >
> > From the stats posted earlier you can see that there is a substantial
> > amount of calls with 6 segments all (alot) smaller than a page. Lots
> > of calls our patch or the write_begin/end will save.
>
> OK. The write_begin/write_end patchset is intrusive, no question. I'm not
> sure what you're intending to do with it. They have been tested in -mm for
> quite a while now, but just going with a simple patch that tries to copy
> more segments might be OK for you if you're backporting. The deadlock is
> pretty uncommon.

Lustre should probably have to be ported over to write_begin/write_end in
order to use it too. With the patches in -mm, if a filesystem is still using
prepare_write/commit_write, the vm reverts to a safe path which avoids
the deadlock (and allows multi-seg io copies), but copies the data twice.

OTOH, this is very likely to go upstream, so your filesystem will need to be
ported over sooner or later anyway.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/