[RFC] Badness in __mutex_unlock_slowpath with XFS stress tests

From: Suzuki
Date: Thu Mar 09 2006 - 02:08:56 EST


Hi all,


I was working on an issue with getting "Badness in
__mutex_unlock_slowpath" and hence a stack trace, while running FS
stress tests on XFS on 2.6.16-rc5 kernel.

The dmesg looks like :

Badness in __mutex_unlock_slowpath at kernel/mutex.c:207
[<c0103c0c>] show_trace+0x20/0x22
[<c0103d4b>] dump_stack+0x1e/0x20
[<c0473f1f>] __mutex_unlock_slowpath+0x12a/0x23b
[<c0473938>] mutex_unlock+0xb/0xd
[<c02a5720>] xfs_read+0x230/0x2d9
[<c02a1bed>] linvfs_aio_read+0x8d/0x98
[<c015f3df>] do_sync_read+0xb8/0x107
[<c015f4f7>] vfs_read+0xc9/0x19b
[<c015f8b2>] sys_read+0x47/0x6e
[<c0102db7>] sysenter_past_esp+0x54/0x75


This happens with XFS DIO reads. xfs_read holds the i_mutex and issues a
__generic_file_aio_read(), which falls into __blockdev_direct_IO with
DIO_OWN_LOCKING flag (since xfs uses own_locking ). Now
__blockdev_direct_IO releases the i_mutex for READs with
DIO_OWN_LOCKING.When it returns to xfs_read, it tries to unlock the
i_mutex ( which is now already unlocked), causing the "Badness".

The possible solution which we can think of, is not to unlock the
i_mutex for DIO_OWN_LOCKING. This will only affect the DIO_OWN_LOCKING users (as of now, only XFS ) with concurrent DIO sync read requests. AIO read requests would not suffer this problem since they would just return once the DIO is submitted.

Another work around for this can be adding a check "mutex_is_locked"
before trying to unlock i_mutex in xfs_read. But this seems to be an
ugly hack. :(

Comments ?


thanks,

Suzuki

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/