Re: [PATCH] autofs4 deadlock during expire - kernel 2.6

From: Mike Waychison
Date: Wed Sep 24 2003 - 11:00:16 EST


Ian Kent wrote:
On Wed, 24 Sep 2003, Arjan van de Ven wrote:


On Wed, 2003-09-24 at 15:01, Ian Kent wrote:

This is a corrected patch for the autofs4 daedlock problem I posted about @@ -206,6 +207,11 @@

interruptible_sleep_on(&wq->queue);

+ if (waitqueue_active(&wq->queue) && current != wq->owner) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(wq->wait_ctr * (HZ/10));
+ }
+

this really really looks like you're trying to pamper over a bug by
changing the timing somewhere instead of fixing it...


Agreed.


also are you sure the deadlock isn't because of the racey use of
interruptible_sleep_on ?


I think the deadlock itself needs to be properly identified.

Could you explain where the deadlock is actually occuring? I briefed over the automount 4 code as well as autofs4 and I don't see the deadlock. The 'owner' in the case of an expiry will be a child process of the daemon, within a call to ioctl(EXPIRE_MULTI), correct? Having it be released from the waitqueue first should not affect flow of execution and released from deadlock.

I don't see how having it wake up before before any other racing processes solves anything.

I think Arjan is right in that the race is do to the nautilus process entering the sleep_on after the a call to wake_up(&wq->queue). I don't know if a change to using a workqueue is best.. how about refactoring that chunk of code to use wait_event_interruptible on the queue, which should be clear of any waitqueue/sleep_on races.



OK so maybe I should have suggestions instead of comments.

Please elaborate.


How about you try out this quick patch I threw together.

Mike Waychison ===== waitq.c 1.6 vs edited =====
--- 1.6/fs/autofs4/waitq.c Fri Feb 7 12:25:20 2003
+++ edited/waitq.c Wed Sep 24 15:48:30 2003
@@ -204,7 +204,7 @@
recalc_sigpending();
spin_unlock_irqrestore(&current->sighand->siglock, irqflags);

- interruptible_sleep_on(&wq->queue);
+ wait_event_interruptible(wq->queue, wq->name == NULL);

spin_lock_irqsave(&current->sighand->siglock, irqflags);
current->blocked = oldset;