Re: [PATCH 4.4 50/92] mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read

From: Jan Kara
Date: Thu May 24 2018 - 05:58:35 EST


On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
> 4.4-stable review patch. If anyone has any objections, please let me know.

Just one objection: Why does stable care about this (and the previous
patch)? I've checked the stable queue and I don't see anything that would
have these patches as a prerequisite. And on their own, they are only
cleanups without substantial gains.

Honza

>
> ------------------
>
> From: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
>
> commit ebded02788b5d7c7600f8cff26ae07896d568649 upstream.
>
> In the generic read paths the kernel looks up a page in the page cache
> and if it's up to date, it is used. If not, the page lock is acquired
> to wait for IO to complete and then check the page. If multiple
> processes are waiting on IO, they all serialise against the lock and
> duplicate the checks. This is unnecessary.
>
> The page lock in itself does not give any guarantees to the callers
> about the page state as it can be immediately truncated or reclaimed
> after the page is unlocked. It's sufficient to wait_on_page_locked and
> then continue if the page is up to date on wakeup.
>
> It is possible that a truncated but up-to-date page is returned but the
> reference taken during read prevents it disappearing underneath the
> caller and the data is still valid if PageUptodate.
>
> The overall impact is small as even if processes serialise on the lock,
> the lock section is tiny once the IO is complete. Profiles indicated
> that unlock_page and friends are generally a tiny portion of a
> read-intensive workload. An artificial test was created that had
> instances of dd access a cache-cold file on an ext4 filesystem and
> measure how long the read took.
>
> paralleldd
> 4.4.0 4.4.0
> vanilla avoidlock
> Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%)
> Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%)
> Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%)
> Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%)
> Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%)
> Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%)
> Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%)
> Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%)
> Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%)
> Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
>
> The impact is small but intuitively, it makes sense to avoid unnecessary
> calls to lock_page.
>
> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> Reviewed-by: Jan Kara <jack@xxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>
> ---
> mm/filemap.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1581,6 +1581,15 @@ find_page:
> index, last_index - index);
> }
> if (!PageUptodate(page)) {
> + /*
> + * See comment in do_read_cache_page on why
> + * wait_on_page_locked is used to avoid unnecessarily
> + * serialisations and why it's safe.
> + */
> + wait_on_page_locked_killable(page);
> + if (PageUptodate(page))
> + goto page_ok;
> +
> if (inode->i_blkbits == PAGE_CACHE_SHIFT ||
> !mapping->a_ops->is_partially_uptodate)
> goto page_not_up_to_date;
> @@ -2253,12 +2262,52 @@ filler:
> if (PageUptodate(page))
> goto out;
>
> + /*
> + * Page is not up to date and may be locked due one of the following
> + * case a: Page is being filled and the page lock is held
> + * case b: Read/write error clearing the page uptodate status
> + * case c: Truncation in progress (page locked)
> + * case d: Reclaim in progress
> + *
> + * Case a, the page will be up to date when the page is unlocked.
> + * There is no need to serialise on the page lock here as the page
> + * is pinned so the lock gives no additional protection. Even if the
> + * the page is truncated, the data is still valid if PageUptodate as
> + * it's a race vs truncate race.
> + * Case b, the page will not be up to date
> + * Case c, the page may be truncated but in itself, the data may still
> + * be valid after IO completes as it's a read vs truncate race. The
> + * operation must restart if the page is not uptodate on unlock but
> + * otherwise serialising on page lock to stabilise the mapping gives
> + * no additional guarantees to the caller as the page lock is
> + * released before return.
> + * Case d, similar to truncation. If reclaim holds the page lock, it
> + * will be a race with remove_mapping that determines if the mapping
> + * is valid on unlock but otherwise the data is valid and there is
> + * no need to serialise with page lock.
> + *
> + * As the page lock gives no additional guarantee, we optimistically
> + * wait on the page to be unlocked and check if it's up to date and
> + * use the page if it is. Otherwise, the page lock is required to
> + * distinguish between the different cases. The motivation is that we
> + * avoid spurious serialisations and wakeups when multiple processes
> + * wait on the same page for IO to complete.
> + */
> + wait_on_page_locked(page);
> + if (PageUptodate(page))
> + goto out;
> +
> + /* Distinguish between all the cases under the safety of the lock */
> lock_page(page);
> +
> + /* Case c or d, restart the operation */
> if (!page->mapping) {
> unlock_page(page);
> page_cache_release(page);
> goto repeat;
> }
> +
> + /* Someone else locked and filled the page in a very small window */
> if (PageUptodate(page)) {
> unlock_page(page);
> goto out;
>
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR