Re: md deadlock (was Re: 2.6.18-mm2)

From: Neil Brown
Date: Fri Sep 29 2006 - 08:53:18 EST


On Friday September 29, a.p.zijlstra@xxxxxxxxx wrote:
> On Thu, 2006-09-28 at 13:54 +0200, Michal Piotrowski wrote:
>
> Looks like a real deadlock here. It seems to me #2 is the easiest to
> break.

I guess it could deadlock if you tried to add /dev/md0 as a component
of /dev/md0. I should probably check for that somewhere.
In other cases the array->member ordering ensures there is no
deadlock.

>
> static int md_open(struct inode *inode, struct file *file)
> {
> /*
> * Succeed if we can lock the mddev, which confirms that
> * it isn't being stopped right now.
> */
> mddev_t *mddev = inode->i_bdev->bd_disk->private_data;
> int err;
>
> if ((err = mddev_lock(mddev)))
> goto out;
>
> err = 0;
> mddev_get(mddev);
> mddev_unlock(mddev);
>
> check_disk_change(inode->i_bdev);
> out:
> return err;
> }
>
> mddev_get() is a simple atomic_inc(), and I fail to see how waiting for
> the lock makes any difference.

Hmm... I"m pretty sure I do want some sort of locking there - to make
sure that the
if (atomic_read(&mddev->active)>2) {
test in do_md_stop actually means something. However it does seem
that the locking I have doesn't really guarantee anything much.

But I really think that this locking order should be allowed. md
should ensure that there are never any loops in the array->member
ordering, and somehow that needs to be communicated to lockdep.

One of the items on my todo list is to sort out the lifetime rules of
md devices (once accessed, they currently never disappear). Getting
this locking right should be part of that.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/