eCryptFS possible circular locking

From: Cyrill Gorcunov
Date: Thu Jun 26 2008 - 09:59:52 EST

Next message: Andi Kleen: "Re: ABI change for device drivers using future AVX instruction set"
Previous message: Vegard Nossum: "Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference"
Next in thread: Michael Halcrow: "[PATCH] eCryptfs: Remove unnecessary mux fromecryptfs_init_ecryptfs_miscdev()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Michael,

there is potential circularo locking in eCryptFS. Actually, I think Ingo
boots catched it. (Original report from Ingo was about a month ago).
Here is a log:

---

> [Ingo Molnar - Fri, May 09, 2008 at 11:44:23AM +0200]
> |
> | * Ingo Molnar <mingo@xxxxxxx> wrote:
> |
> | > x86.git testing found the following "possible circular locking"
> | > lockdep warning, generated by the ecryptfs code:
> |
> | the correct URL for the config is:
> |
> | http://redhat.com/~mingo/misc/config-Wed_May__7_16_17_31_CEST_2008.bad
> |
> | this is reproducible, it happened a second time since i originally
> | reported it.
> |
> | Ingo
>
> Hi Ingo, I've just sent patch, could you test it, please?
>
> http://lkml.org/lkml/2008/5/10/20

that patch is now upstream - but see below there's still a circular
dependency.

Ingo

[ 21.530026]
[ 21.530026] =======================================================
[ 21.530026] [ INFO: possible circular locking dependency detected ]
[ 21.530026] 2.6.26-rc2-sched-devel.git #1205
[ 21.530026] -------------------------------------------------------
[ 21.530026] multipath.stati/1379 is trying to acquire lock:
[ 21.530026] (&ecryptfs_daemon_hash_mux){--..}, at: [<c020cca2>] ecryptfs_miscdev_open+0x22/0x140
[ 21.530026]
[ 21.530026] but task is already holding lock:
[ 21.530026] (misc_mtx){--..}, at: [<c02f4b92>] misc_open+0x22/0x120
[ 21.530026]
[ 21.530026] which lock already depends on the new lock.
[ 21.530026]
[ 21.530026]
[ 21.530026] the existing dependency chain (in reverse order) is:
[ 21.530026]
[ 21.530026] -> #1 (misc_mtx){--..}:
[ 21.530026] [<c01438aa>] __lock_acquire+0xc3a/0x1100
[ 21.530026] [<c0143de6>] lock_acquire+0x76/0xa0
[ 21.530026] [<c05d032a>] mutex_lock_nested+0x7a/0x280
[ 21.530026] [<c02f4cb5>] misc_register+0x25/0x140
[ 21.530026] [<c020cb5c>] ecryptfs_init_ecryptfs_miscdev+0x2c/0x60
[ 21.530026] [<c020c5fe>] ecryptfs_init_messaging+0x1ee/0x280
[ 21.530026] [<c0837662>] ecryptfs_init+0xc2/0x1b0
[ 21.530026] [<c0824852>] kernel_init+0x132/0x250
[ 21.530026] [<c010365f>] kernel_thread_helper+0x7/0x10
[ 21.530026] [<ffffffff>] 0xffffffff
[ 21.530026]
[ 21.530026] -> #0 (&ecryptfs_daemon_hash_mux){--..}:
[ 21.530026] [<c01436cd>] __lock_acquire+0xa5d/0x1100
[ 21.530026] [<c0143de6>] lock_acquire+0x76/0xa0
[ 21.530026] [<c05d032a>] mutex_lock_nested+0x7a/0x280
[ 21.530026] [<c020cca2>] ecryptfs_miscdev_open+0x22/0x140
[ 21.530026] [<c02f4bf2>] misc_open+0x82/0x120
[ 21.530026] [<c0187c26>] chrdev_open+0x86/0x150
[ 21.530026] [<c0183289>] __dentry_open+0xa9/0x260
[ 21.530026] [<c0183487>] nameidata_to_filp+0x47/0x60
[ 21.530026] [<c018fda4>] do_filp_open+0x184/0x7a0
[ 21.530026] [<c01830aa>] do_sys_open+0x4a/0xb0
[ 21.530026] [<c018317e>] sys_open+0x2e/0x40
[ 21.530026] [<c010330e>] syscall_call+0x7/0xb
[ 21.530026] [<ffffffff>] 0xffffffff
[ 21.530026]
[ 21.530026] other info that might help us debug this:
[ 21.530026]
[ 21.530026] 1 lock held by multipath.stati/1379:
[ 21.530026] #0: (misc_mtx){--..}, at: [<c02f4b92>] misc_open+0x22/0x120
[ 21.530026]

---

Some analisys shows the following interesting things -

Call chain (if being compiled into the kernel)
----------------------------------------------

ecryptfs_init
ecryptfs_init_messaging
ecryptfs_init_ecryptfs_miscdev
mutex_lock(&ecryptfs_daemon_hash_mux);
misc_register(&ecryptfs_miscdev);
mutex_lock(&misc_mtx);
mutex_unlock(&misc_mtx);
1 --->
mutex_unlock(&ecryptfs_daemon_hash_mux);

As only we have misc device registered it is valid to open it.
And that is happened by another proccess

chrdev_open
misc_open
mutex_lock(&misc_mtx);
ecryptfs_miscdev_open
mutex_lock(&ecryptfs_daemon_hash_mux);

If this happens at point (1) /ie chrdev_open called at this point/
we are getting to circular warning as on top. I'm not sure how to handle this
in more elegant way but I think we should defer all calls to misc
device until eCryptFS has been initialized. Could you take a look,
please?

- Cyrill -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andi Kleen: "Re: ABI change for device drivers using future AVX instruction set"
Previous message: Vegard Nossum: "Re: v2.6.26-rc7: BUG: unable to handle kernel NULL pointer dereference"
Next in thread: Michael Halcrow: "[PATCH] eCryptfs: Remove unnecessary mux fromecryptfs_init_ecryptfs_miscdev()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]