[PATCH mmotm] mm/munlock: mlock_vma_folio() check against VM_SPECIAL

From: Hugh Dickins
Date: Wed Mar 02 2022 - 20:35:36 EST


Although mmap_region() and mlock_fixup() take care that VM_LOCKED
is never left set on a VM_SPECIAL vma, there is an interval while
file->f_op->mmap() is using vm_insert_page(s), when VM_LOCKED may
still be set while VM_SPECIAL bits are added: so mlock_vma_folio()
should ignore VM_LOCKED while any VM_SPECIAL bits are set.

This showed up as a "Bad page" still mlocked, when vfree()ing pages
which had been vm_inserted by remap_vmalloc_range_partial(): while
release_pages() and __page_cache_release(), and so put_page(), catch
pages still mlocked when freeing (and clear_page_mlock() caught them
when unmapping), the vfree() path is unprepared for them: fix it?
but these pages should not have been mlocked in the first place.

I assume that an mlockall(MCL_FUTURE) had been done in the past; or
maybe the user got to specify MAP_LOCKED on a vmalloc'ing driver mmap.

Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
---
Diffed against top of next-20220301 or mmotm 2022-02-28-14-45.
This patch really belongs as a fix to the mm/munlock series in
Matthew's tree, so he might like to take it in there (but the patch
here is the foliated version, so easiest to place it after foliation).

mm/internal.h | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

--- a/mm/internal.h
+++ b/mm/internal.h
@@ -421,8 +421,15 @@ extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
static inline void mlock_vma_folio(struct folio *folio,
struct vm_area_struct *vma, bool compound)
{
- /* VM_IO check prevents migration from double-counting during mlock */
- if (unlikely((vma->vm_flags & (VM_LOCKED|VM_IO)) == VM_LOCKED) &&
+ /*
+ * The VM_SPECIAL check here serves two purposes.
+ * 1) VM_IO check prevents migration from double-counting during mlock.
+ * 2) Although mmap_region() and mlock_fixup() take care that VM_LOCKED
+ * is never left set on a VM_SPECIAL vma, there is an interval while
+ * file->f_op->mmap() is using vm_insert_page(s), when VM_LOCKED may
+ * still be set while VM_SPECIAL bits are added: so ignore it then.
+ */
+ if (unlikely((vma->vm_flags & (VM_LOCKED|VM_SPECIAL)) == VM_LOCKED) &&
(compound || !folio_test_large(folio)))
mlock_folio(folio);
}