Re: [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations

From: Andrew Morton
Date: Tue Dec 04 2018 - 17:50:25 EST


On Fri, 30 Nov 2018 14:58:11 -0500 Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:

> Currently we only drop the mmap_sem if there is contention on the page
> lock. The idea is that we issue readahead and then go to lock the page
> while it is under IO and we want to not hold the mmap_sem during the IO.
>
> The problem with this is the assumption that the readahead does
> anything. In the case that the box is under extreme memory or IO
> pressure we may end up not reading anything at all for readahead, which
> means we will end up reading in the page under the mmap_sem.

Please describe here why this is considered to be a problem.
Application stalling, I assume? Some description of in-the-field
observations would be appropriate. ie, how serious is the problem
whcih is being addressed.

> Instead rework filemap fault path to drop the mmap sem at any point that
> we may do IO or block for an extended period of time. This includes
> while issuing readahead, locking the page, or needing to call ->readpage
> because readahead did not occur. Then once we have a fully uptodate
> page we can return with VM_FAULT_RETRY and come back again to find our
> nicely in-cache page that was gotten outside of the mmap_sem.
>
> ...
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2304,28 +2304,44 @@ EXPORT_SYMBOL(generic_file_read_iter);
>
> #ifdef CONFIG_MMU
> #define MMAP_LOTSAMISS (100)
> +static struct file *maybe_unlock_mmap_for_io(struct file *fpin,
> + struct vm_area_struct *vma,
> + int flags)
> +{
> + if (fpin)
> + return fpin;
> + if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) ==
> + FAULT_FLAG_ALLOW_RETRY) {
> + fpin = get_file(vma->vm_file);
> + up_read(&vma->vm_mm->mmap_sem);
> + }
> + return fpin;
> +}

A code comment would be nice. What it does and, especially, why it does it.

> - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) {
> - put_page(page);
> - return ret | VM_FAULT_RETRY;
> + /*
> + * We are open-coding lock_page_or_retry here because we want to do the
> + * readpage if necessary while the mmap_sem is dropped. If there
> + * happens to be a lock on the page but it wasn't being faulted in we'd
> + * come back around without ALLOW_RETRY set and then have to do the IO
> + * under the mmap_sem, which would be a bummer.

Expanding on "a bummer" would help here ;)

> + */
> + if (!trylock_page(page)) {
> + fpin = maybe_unlock_mmap_for_io(fpin, vmf->vma, vmf->flags);
> + if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> + goto out_retry;
> + if (vmf->flags & FAULT_FLAG_KILLABLE) {
> + if (__lock_page_killable(page)) {
> + /*
> + * If we don't have the right flags for
> + * maybe_unlock_mmap_for_io to do it's thing we

"its"

> + * still need to drop the sem and return
> + * VM_FAULT_RETRY so the upper layer checks the
> + * signal and takes the appropriate action.
> + */
> + if (!fpin)
> + up_read(&vmf->vma->vm_mm->mmap_sem);
> + goto out_retry;
> + }
> + } else
> + __lock_page(page);
> }
>
> /* Did it get truncated? */