On Thu, 2013-10-24 at 14:47 +0200, Thomas Gleixner wrote:
> On Thu, 26 Sep 2013, zhang.yi20@xxxxxxxxxx wrote:
> > Task processes all its owned robust futex when it is exiting,
> > to ensure the futexes can be taken by other tasks.
> >
> > Though this can not work good in sometimes.
> > Think about this sceneï
> > 1. A robust mutex is shared for two processes, each process has
> > multi threads to lock the mutex.
> > 2. One of the threads locks the mutex, and the others are waiting
> > and sorted in order of priority.
> > 3. The process to which the mutex owner thread belongs is dying
> > without unlocking the mutexïand handle_futex_death is invoked
> > to wake the first waiter.
> > 4. If the first waiter belongs to the same processïit has no chance
> > to return to the userspace to lock the mutex, and it won't wake
> > the next waiter because it is not the owner of the mutex.
> > 5. The rest waiters of the other process may block forever.
> Fair enough.
> > This patch remove the owner check when waking task in handle_futex_death.
> > If above occured, The dying task can wake the next waiter by processing its list_op_pending.
> > The waked task could return to userspace and try to lock the mutex again.
> >
> The owner check needs to stay. The robust list is a user space managed
> list on which the kernel operates. We do not trust it at all and we
> really want as many sanity checks as possible and definitely not
> removing one.
> The simplest solution is just to wake all waiters and let them sort it
> out. We could do a more complex one by checking whether this is a
> group exit and not wake any threads belonging to the same process, but
> that's not really worth the trouble.

I think given that this is the error path, that waking everything is a
reasonable approach to addressing the problem. I haven't been able to
step through the failure case as carefully as I've wanted to, but it
appears sound just reading through it.


> Thanks,
> tglx

