Re: [PATCH 1/6] Extend completions to provide XFS object flushrequirements

From: Dave Chinner
Date: Thu Jun 26 2008 - 09:11:41 EST


On Thu, Jun 26, 2008 at 06:40:09AM -0600, Matthew Wilcox wrote:
> On Thu, Jun 26, 2008 at 10:21:12PM +1000, Dave Chinner wrote:
> > On Thu, Jun 26, 2008 at 05:42:42AM -0600, Matthew Wilcox wrote:
> > > Then let's leave it as a semaphore. You can get rid of the sema_t if
> > > you like, but I don't think that turning completions into semaphores is
> > > a good idea (because it's confusing).
> >
> > So remind me what the point of the semaphore removal tree is again?
>
> To remove the semaphores which don't need to be semaphores any more.

Or shouldn't be semaphores in the first place?

> > As Christoph suggested, I can put this under another API that
> > is implemented using completions. If I have to do that in XFS,
> > so be it....
>
> You could, yes. But you could just use completions directly ...

Not that I can see.

> > The main reason for this that we've just uncovered the fact that the
> > way XFS uses semaphores is completely unsafe [*] on x86/x86_64 for
> > kernels prior to the new generic semaphores.
> >
> > [*] 2.6.20 panics in up() because of this race when I/O completion
> > (the up call) races with a simultaneous down() (iowaiter):
> >
> > T1 T2
> > up() down()
> > kmem_free()
> >
> > When the down() call completes, the up() call can still be
> > referencing the semaphore, and hence if we free the structure after
> > the down call then the up() will reference freed memory. This is
> > probably the cause of many unexplained log replay or unmount panics
> > that we've been hitting for years with buffers that been freed while
> > apparently still in use....
>
> This is exactly the kind of thing completions were supposed to be used
> for. T1 should be calling complete() and T2 should be calling
> wait_for_completion().

Yes, certainly. But as should be obvious by now completions don't
quite fit the bill for XFS - they only work for *synchronisation*
after the I/O. XFS needs *exclusion* during the I/O as well as
*synchronisation* after the I/O. The completion extensions provided the
exclusion part of the deal. How else do you suggest I implement
this?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/