Re: [PATCH v5 7/7] mm: Remove the now-unnecessary mmget_still_valid() hack

From: Hugh Dickins
Date: Mon Aug 31 2020 - 02:07:16 EST


On Thu, 27 Aug 2020, Jann Horn wrote:

> The preceding patches have ensured that core dumping properly takes the
> mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
> its users.

Hi Jann, while the only tears to be shed over losing mmget_still_valid()
will be tears of joy, I think you need to explain why you believe it's
safe to remove the instance in mm/khugepaged.c: which you'll have found
I moved just recently, to cover an extra case (sorry for not Cc'ing you).

> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -431,7 +431,7 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm,
>
> static inline int khugepaged_test_exit(struct mm_struct *mm)
> {
> - return atomic_read(&mm->mm_users) == 0 || !mmget_still_valid(mm);
> + return atomic_read(&mm->mm_users) == 0;
> }
>
> static bool hugepage_vma_check(struct vm_area_struct *vma,

The movement (which you have correctly followed) was in
bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
but the "pmd .. physical page 0" issue is explained better in its parent
18e77600f7a1 ("khugepaged: retract_page_tables() remember to test exit")

I think your core dumping is still reading the page tables without
holding mmap_lock, so still vulnerable to that extra issue. It won't
be as satisfying as removing all traces of mmget_still_valid(), but
right now I think you should add an mm->core_state check there instead.

(I do have a better solution in use, but it's a much larger patch, that
will take a lot more effort to get in: checks in pte_offset_map_lock(),
perhaps returning NULL when pmd is transitioning, requiring retry.)

Or maybe it's me who has missed what you're doing instead.

Hugh