Re: recursive locking: epoll.

From: Nelson Elhage
Date: Sat Jul 30 2011 - 17:25:33 EST


On Sat, Jul 30, 2011 at 2:26 PM, Nelson Elhage <nelhage@xxxxxxxxxxx> wrote:
> Oof, this is kinda ugly.
>
> I *believe* that as of
>
>  22bacca4 epoll: prevent creating circular epoll structures
>
> the epoll locking is correct, with the rule that two ep->mtx's can be
> locked recursively iff ep1 contains ep2 (possibly indirectly), and
> that if ep1 contains ep2, ep1 will always be locked first. Since
> 22bacca4 eliminated the possibility of epoll cycles, this means there
> is a well-defined lock order.
>
> I *think* that for any static configuration of epoll file descriptors,
> we can fix the problem by doing something like using the "call_nests"
> parameter passed by ep_call_nested as the lock subkey, but I haven't
> thought this through completely.
>
> However, since that lock order is subject to change, and even
> reversal, at runtime, I think the following (pathological) sequence of
> userspace calls will trigger lockdep warnings, even though there is
> never any risk of deadlock:

Thinking about this more, I think that the "call_nests" approach won't
have this problem, since that lets the locks change subclasses exactly
as necessary. Unless someone beats me to it, I'll see if I can put
such a patch together.

- Nelson

>
>  - Create epoll fds ep1 and ep2
>  - Add ep1 to ep2
>  - Do some operations that result in recursive locking
>  - Remove ep1 from ep2
>  - Add ep2 to ep1
>  - Do some operations that result in recursive locking
>
> In fact, that program should trigger warnings even if we did the
> pathological thing of using the address of the 'struct eventpoll' as
> the subclass [1], since it is *literally the same two locks* that are
> getting acquired in different orders at different times.
>
> I also don't see a way to simplify the epoll locking without adding
> more restrictions to how the API can be used. As far as I can tell,
> the situation really is just that nasty.
>
> - Nelson
>
> [1] Never mind that the "subclass" is an unsigned int, so we can't
>    even do that directly on 64-bit systems.
>
> On Fri, Jul 29, 2011 at 08:50:55PM +0200, Paul Bolle wrote:
>> (Sent to the addresses get_maintainer.pl suggested and to Davide and
>> Nelson, because this is about code they cared about half a year ago.
>> CC'ed to the addresses involved until now.)
>>
>> On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote:
>> > That number turned out to be 722472
>> > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ).
>>
>> 0) This seems to be a lockdep false alarm. The cause is an epoll
>> instance added to another epoll instance (ie, nesting epoll instances).
>> Apparently lockdep isn't supplied enough information to determine what's
>> going on here. Now there might be a number of ways to fix this. But
>> after having looked at this for quite some time and updating the above
>> bug report a number of times, I guessed that involving people outside
>> those tracking that report might move things forward towards a solution.
>> At least, I wasn't able to find a, well, clean solution.
>>
>> 1) The call chain triggering the warning with the nice
>>     *** DEADLOCK ***
>>
>> line can be summarized like this:
>>
>> sys_epoll_ctl
>>     mutex_lock                          epmutex
>>     ep_call_nested
>>         ep_loop_check_proc
>>             mutex_lock                      ep->mtx
>>             mutex_unlock                    ep->mtx
>>     mutex_lock                              ep->mtx
>>     ep_eventpoll_poll
>>         ep_ptable_queue_proc
>>         ep_call_nested
>>             ep_poll_readyevents_pro
>>                 ep_scan_ready_list
>>                     mutex_lock                  ep->mtx
>>                     ep_read_events_proc
>>                     mutex_unlock                ep->mtx
>>     mutex_unlock                            ep->mtx
>>     mutex_unlock                        epmutex
>>
>> 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices
>> recursive locking on ep->mtx. It is not supplied enough information to
>> determine that the lock is related to two separate epoll instances (the
>> outer instance and the nested instance). The solution appears to involve
>> supplying lockdep that information (ie, "lockdep annotation").
>>
>> 3) Please see the bugzilla.redhat.com report for further background.
>>
>>
>> Paul Bolle
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/