Re: recursive locking: epoll.

From: Nelson Elhage
Date: Sat Jul 30 2011 - 14:26:34 EST

Next message: Linus Torvalds: "Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1"
Previous message: Linus Torvalds: "Re: [PATCH 1/2] random: Add support for architectural random hooks"
In reply to: Paul Bolle: "Re: recursive locking: epoll."
Next in thread: Nelson Elhage: "Re: recursive locking: epoll."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Oof, this is kinda ugly.

I *believe* that as of

22bacca4 epoll: prevent creating circular epoll structures

the epoll locking is correct, with the rule that two ep->mtx's can be
locked recursively iff ep1 contains ep2 (possibly indirectly), and
that if ep1 contains ep2, ep1 will always be locked first. Since
22bacca4 eliminated the possibility of epoll cycles, this means there
is a well-defined lock order.

I *think* that for any static configuration of epoll file descriptors,
we can fix the problem by doing something like using the "call_nests"
parameter passed by ep_call_nested as the lock subkey, but I haven't
thought this through completely.

However, since that lock order is subject to change, and even
reversal, at runtime, I think the following (pathological) sequence of
userspace calls will trigger lockdep warnings, even though there is
never any risk of deadlock:

- Create epoll fds ep1 and ep2
- Add ep1 to ep2
- Do some operations that result in recursive locking
- Remove ep1 from ep2
- Add ep2 to ep1
- Do some operations that result in recursive locking

In fact, that program should trigger warnings even if we did the
pathological thing of using the address of the 'struct eventpoll' as
the subclass [1], since it is *literally the same two locks* that are
getting acquired in different orders at different times.

I also don't see a way to simplify the epoll locking without adding
more restrictions to how the API can be used. As far as I can tell,
the situation really is just that nasty.

- Nelson

[1] Never mind that the "subclass" is an unsigned int, so we can't
even do that directly on 64-bit systems.

On Fri, Jul 29, 2011 at 08:50:55PM +0200, Paul Bolle wrote:
> (Sent to the addresses get_maintainer.pl suggested and to Davide and
> Nelson, because this is about code they cared about half a year ago.
> CC'ed to the addresses involved until now.)
>
> On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote:
> > That number turned out to be 722472
> > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ).
>
> 0) This seems to be a lockdep false alarm. The cause is an epoll
> instance added to another epoll instance (ie, nesting epoll instances).
> Apparently lockdep isn't supplied enough information to determine what's
> going on here. Now there might be a number of ways to fix this. But
> after having looked at this for quite some time and updating the above
> bug report a number of times, I guessed that involving people outside
> those tracking that report might move things forward towards a solution.
> At least, I wasn't able to find a, well, clean solution.
>
> 1) The call chain triggering the warning with the nice
> *** DEADLOCK ***
>
> line can be summarized like this:
>
> sys_epoll_ctl
> mutex_lock epmutex
> ep_call_nested
> ep_loop_check_proc
> mutex_lock ep->mtx
> mutex_unlock ep->mtx
> mutex_lock ep->mtx
> ep_eventpoll_poll
> ep_ptable_queue_proc
> ep_call_nested
> ep_poll_readyevents_pro
> ep_scan_ready_list
> mutex_lock ep->mtx
> ep_read_events_proc
> mutex_unlock ep->mtx
> mutex_unlock ep->mtx
> mutex_unlock epmutex
>
> 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices
> recursive locking on ep->mtx. It is not supplied enough information to
> determine that the lock is related to two separate epoll instances (the
> outer instance and the nested instance). The solution appears to involve
> supplying lockdep that information (ie, "lockdep annotation").
>
> 3) Please see the bugzilla.redhat.com report for further background.
>
>
> Paul Bolle
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Linus Torvalds: "Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1"
Previous message: Linus Torvalds: "Re: [PATCH 1/2] random: Add support for architectural random hooks"
In reply to: Paul Bolle: "Re: recursive locking: epoll."
Next in thread: Nelson Elhage: "Re: recursive locking: epoll."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]