Re: [PATCH] autofs: don't fail mount for transient error
From: Ian Kent
Date:  Fri Nov 03 2017 - 08:45:14 EST
On 03/11/17 09:40, NeilBrown wrote:
> 
Hi Neil, and thanks taking the time to post the patch.
> Currently if the autofs kernel module gets an error when
> writing to the pipe which links to the daemon, then it
> marks the whole moutpoint as catatonic, and it will stop working.
> 
> It is possible that the error is transient.  This can happen
> if the daemon is slow and more than 16 requests queue up.
> If a subsequent process tries to queue a request, and is then signalled,
> the write to the pipe will return -ERESTARTSYS and autofs
> will take that as total failure.
Indeed it does.
And given the problems with a half dozen (or so) user space
applications consuming large amounts of CPU under heavy mount
and umount activity this could happen more easily than we
expect.
> 
> So change the code to assess -ERESTARTSYS and -ENOMEM as transient
> failures which only abort the current request, not the whole
> mountpoint.
This looks good to me.
> 
> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
> ---
> 
> Do people think this should got to -stable ??
> It isn't a crash or a data corruption, but having autofs mountpoints
> suddenly stop working is rather inconvenient.
Perhaps that's a good idea given the CPU usage problem I refer
to above has been around for a while now.
> 
> Thanks,
> NeilBrown
> 
> 
>  fs/autofs4/waitq.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
> index 4ac49d038bf3..8fc41705c7cd 100644
> --- a/fs/autofs4/waitq.c
> +++ b/fs/autofs4/waitq.c
> @@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_sb_info *sbi,
>  		spin_unlock_irqrestore(¤t->sighand->siglock, flags);
>  	}
>  
> -	return (bytes > 0);
> +	/* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
> +	return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
>  }
>  
>  static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
> @@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
>  	} pkt;
>  	struct file *pipe = NULL;
>  	size_t pktsz;
> +	int ret;
>  
>  	pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
>  		 (unsigned long) wq->wait_queue_token,
> @@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
>  	mutex_unlock(&sbi->wq_mutex);
>  
>  	if (autofs4_write(sbi, pipe, &pkt, pktsz))
> +	switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
> +	case 0:
> +		break;
> +	case -ENOMEM:
> +	case -ERESTARTSYS:
> +		/* Just fail this one */
> +		autofs4_wait_release(sbi, wq->wait_queue_token, ret);
> +		break;
> +	default:
>  		autofs4_catatonic_mode(sbi);
> +		break;
> +	}
>  	fput(pipe);
>  }
>  
>