Re: [PATCH 1/6] Extend completions to provide XFS object flush requirements

From: Matthew Wilcox
Date: Thu Jun 26 2008 - 08:55:19 EST


On Thu, Jun 26, 2008 at 10:21:12PM +1000, Dave Chinner wrote:
> On Thu, Jun 26, 2008 at 05:42:42AM -0600, Matthew Wilcox wrote:
> > Then let's leave it as a semaphore. You can get rid of the sema_t if
> > you like, but I don't think that turning completions into semaphores is
> > a good idea (because it's confusing).
>
> So remind me what the point of the semaphore removal tree is again?

To remove the semaphores which don't need to be semaphores any more.

> As Christoph suggested, I can put this under another API that
> is implemented using completions. If I have to do that in XFS,
> so be it....

You could, yes. But you could just use completions directly ...

> The main reason for this that we've just uncovered the fact that the
> way XFS uses semaphores is completely unsafe [*] on x86/x86_64 for
> kernels prior to the new generic semaphores.
>
> [*] 2.6.20 panics in up() because of this race when I/O completion
> (the up call) races with a simultaneous down() (iowaiter):
>
> T1 T2
> up() down()
> kmem_free()
>
> When the down() call completes, the up() call can still be
> referencing the semaphore, and hence if we free the structure after
> the down call then the up() will reference freed memory. This is
> probably the cause of many unexplained log replay or unmount panics
> that we've been hitting for years with buffers that been freed while
> apparently still in use....

This is exactly the kind of thing completions were supposed to be used
for. T1 should be calling complete() and T2 should be calling
wait_for_completion().

> Hence I'd prefer just to move completely away from semaphores for
> this flush interface. I'd like to start with getting the upstream
> code fixed in a sane manner so all the backports to older kernels
> start from the same series of commits.

That's sane.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/