Re: [PATCH v12 5/6] khugepaged: enable collapse pmd for pte-mapped THP

From: Song Liu
Date: Thu Aug 08 2019 - 13:11:22 EST




> On Aug 8, 2019, at 9:33 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> On 08/07, Song Liu wrote:
>>
>> +void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
>> +{
>> + unsigned long haddr = addr & HPAGE_PMD_MASK;
>> + struct vm_area_struct *vma = find_vma(mm, haddr);
>> + struct page *hpage = NULL;
>> + pmd_t *pmd, _pmd;
>> + spinlock_t *ptl;
>> + int count = 0;
>> + int i;
>> +
>> + if (!vma || !vma->vm_file ||
>> + vma->vm_start > haddr || vma->vm_end < haddr + HPAGE_PMD_SIZE)
>> + return;
>> +
>> + /*
>> + * This vm_flags may not have VM_HUGEPAGE if the page was not
>> + * collapsed by this mm. But we can still collapse if the page is
>> + * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
>> + * will not fail the vma for missing VM_HUGEPAGE
>> + */
>> + if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
>> + return;
>> +
>> + pmd = mm_find_pmd(mm, haddr);
>
> OK, I do not see anything really wrong...
>
> a couple of questions below.
>
>> + for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
>> + pte_t *pte = pte_offset_map(pmd, addr);
>> + struct page *page;
>> +
>> + if (pte_none(*pte))
>> + continue;
>> +
>> + page = vm_normal_page(vma, addr, *pte);
>> +
>> + if (!page || !PageCompound(page))
>> + return;
>> +
>> + if (!hpage) {
>> + hpage = compound_head(page);
>
> OK,
>
>> + if (hpage->mapping != vma->vm_file->f_mapping)
>> + return;
>
> is it really possible? May be WARN_ON(hpage->mapping != vm_file->f_mapping)
> makes more sense ?

I haven't found code paths lead to this, but this is technically possible.
This pmd could contain subpages from different THPs. The __replace_page()
function in uprobes.c creates similar pmd.

Current uprobe code won't really create this problem, because
!PageCompound() check above is sufficient. But it won't be difficult to
modify uprobe code to break this. For this code to be accurate and safe,
I think both this check and the one below are necessary. Also, this code
is not on any critical path, so the overhead should be negligible.

Does this make sense?

Thanks,
Song