Re: [syzbot] WARNING in follow_hugetlb_page

From: John Hubbard
Date: Fri May 20 2022 - 18:56:15 EST


On 5/20/22 15:19, Minchan Kim wrote:
The memory offline would be an issue so we shouldn't allow pinning of any
pages in *movable zone*.

Isn't alloc_contig_range just best effort? Then, it wouldn't be a big
problem to allow pinning on those area. The matter is what target range
on alloc_contig_range is backed by CMA or movable zone and usecases.

IOW, movable zone should be never allowed. But CMA case, if pages
are used by normal process memory instead of hugeTLB, we shouldn't
allow longterm pinning since someone can claim those memory suddenly.
However, we are fine to allow longterm pinning if the CMA memory
already claimed and mapped at userspace(hugeTLB case IIUC).


From Mike's comments and yours, plus a rather quick reading of some
CMA-related code in mm/hugetlb.c (free_gigantic_page(), alloc_gigantic_pages()), the following seems true:

a) hugetlbfs can allocate pages *from* CMA, via cma_alloc()

b) while hugetlbfs is using those CMA-allocated pages, it is debatable
whether those pages should be allowed to be long term pinned. That's
because there are two cases:

Case 1: pages are longterm pinned, then released, all while
owned by hugetlbfs. No problem.

Case 2: pages are longterm pinned, but then hugetlbfs releases the
pages entirely (via unmounting hugetlbfs, I presume). In
this case, we now have CMA page that are long-term pinned,
and that's the state we want to avoid.

The reason it is debatable is that hugetlbfs is intended to be used
long term, itself. The expected use cases do not normally include a
lot of short term mounting and unmounting.

And whichever way that debate goes, we need to allow it to be
fixable, by not tying "is pinnable" to "using gup/pup". The caller
has the context that is needed to make that policy decision, but
gup/pup does not.

At this point, I think it's time to fix up the problems and restore
previous behavior, by choosing Case 1 behavior for now. And also
lifting the is_pinnable_page() checks up a level, as noted in my
other thread. I can do that, unless someone sees a flaw in the
reasoning.

thanks,
--
John Hubbard
NVIDIA