Re: [PATCH Part2 RFC v4 10/40] x86/fault: Add support to handle the RMP fault for user address

From: Dave Hansen
Date: Fri Jul 30 2021 - 12:32:24 EST


On 7/30/21 9:00 AM, Vlastimil Babka wrote:
> On 7/7/21 8:35 PM, Brijesh Singh wrote:
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4407,6 +4407,15 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>> return 0;
>> }
>>
>> +static int handle_split_page_fault(struct vm_fault *vmf)
>> +{
>> + if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
>> + return VM_FAULT_SIGBUS;
>> +
>> + __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
>> + return 0;
>> +}
>> +
> I think back in v1 Dave asked if khugepaged will just coalesce this back, and it
> wasn't ever answered AFAICS.
>
> I've checked the code and I think the answer is: no. Khugepaged isn't designed
> to coalesce a pte-mapped hugepage back to pmd in place. And the usual way (copy
> to a new huge page) I think will not succeed because IIRC the page is also
> FOLL_PIN pinned and khugepaged_scan_pmd() will see the elevated refcounts via
> is_refcount_suitable() and give up.

I _thought_ this was the whole "PTE mapped THP" bit of code, like
collapse_pte_mapped_thp(). But, looking at it again, I think that code
is just for the huge tmpfs flavor of THP.

Either way, I'm kinda surprised that we don't collapse things in place.
Especially in the early days, there were lots of crazy things that
split THPs. I think even things like /proc/$pid/smaps split them.

In any case, it sounds like SEV-SNP users should probably be advised to
use MADV_NOHUGEPAGE to avoid any future surprises. At least until the
hardware folks get their act together and teach the TLB how to fracture
2M entries properly. :)