Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present

From: Thomas HellstrÃm (VMware)
Date: Wed Oct 09 2019 - 12:20:37 EST


Hi, Kirill.

Thanks for reviewing.

On 10/9/19 5:27 PM, Kirill A. Shutemov wrote:
On Tue, Oct 08, 2019 at 11:15:02AM +0200, Thomas HellstrÃm (VMware) wrote:
From: Thomas Hellstrom <thellstrom@xxxxxxxxxx>

The pagewalk code was unconditionally splitting transhuge pmds when a
pte_entry was present. However ideally we'd want to handle transhuge pmds
in the pmd_entry function and ptes in pte_entry function. So don't split
huge pmds when there is a pmd_entry function present, but let the callback
take care of it if necessary.
Do we have any current user that expect split_huge_pmd() in this scenario.

No. All current users either have pmd_entry (no splitting) or pte_entry (unconditional splitting)


In order to make sure a virtual address range is handled by one and only
one callback, and since pmd entries may be unstable, we introduce a
pmd_entry return code that tells the walk code to continue processing this
pmd entry rather than to move on. Since caller-defined positive return
codes (up to 2) are used by current callers, use a high value that allows a
large range of positive caller-defined return codes for future users.

Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Will Deacon <will.deacon@xxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Huang Ying <ying.huang@xxxxxxxxx>
Cc: JÃrÃme Glisse <jglisse@xxxxxxxxxx>
Cc: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx>
Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Thomas Hellstrom <thellstrom@xxxxxxxxxx>
---
include/linux/pagewalk.h | 8 ++++++++
mm/pagewalk.c | 28 +++++++++++++++++++++-------
2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index bddd9759bab9..c4a013eb445d 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -4,6 +4,11 @@
#include <linux/mm.h>
+/* Highest positive pmd_entry caller-specific return value */
+#define PAGE_WALK_CALLER_MAX (INT_MAX / 2)
+/* The handler did not handle the entry. Fall back to the next level */
+#define PAGE_WALK_FALLBACK (PAGE_WALK_CALLER_MAX + 1)
+
That's hacky.

Maybe just use an error code for this? -EAGAIN?

I agree this is hacky. But IMO it's a reasonably safe option. My thinking was that in the long run we'd move the positive return codes to the mm_walk private and introduce a PAGE_WALK_TERMINATE code as well.

Perhaps a completely clean and safe way would be to add an "int walk_control" in the struct mm_walk?

I'm pretty sure using an error code will come back and bite us at some point, if someone just blindly forwards error messages. But if you insist, I'll use -EAGAIN.

Please let me know what you think.

Thanks,

Thomas