Re: [PATCH v4 6/6] mm: madvise: Avoid split during MADV_PAGEOUT and MADV_COLD
From: Ryan Roberts
Date: Fri Mar 15 2024 - 06:55:36 EST
On 15/03/2024 10:35, David Hildenbrand wrote:
>> - if (!pageout && pte_young(ptent)) {
>> - ptent = ptep_get_and_clear_full(mm, addr, pte,
>> - tlb->fullmm);
>> - ptent = pte_mkold(ptent);
>> - set_pte_at(mm, addr, pte, ptent);
>> - tlb_remove_tlb_entry(tlb, pte, addr);
>> + if (!pageout) {
>> + for (; nr != 0; nr--, pte++, addr += PAGE_SIZE) {
>> + if (ptep_test_and_clear_young(vma, addr, pte))
>> + tlb_remove_tlb_entry(tlb, pte, addr);
>> + }
>> }
>
>
> The following might turn out a bit nicer: Make folio_pte_batch() collect
> "any_young", then doing something like we do with "any_writable" in the fork()
> case:
>
> ...
> nr = folio_pte_batch(folio, addr, pte, ptent, max_nr,
> fpb_flags, NULL, any_young);
> if (any_young)
> pte_mkyoung(ptent)
> ...
>
> if (!pageout && pte_young(ptent)) {
> mkold_full_ptes(mm, addr, pte, nr, tlb->fullmm);
> tlb_remove_tlb_entries(tlb, pte, nr, addr);
> }
>
I thought about that but decided that it would be better to only TLBI the actual
entries that were young. Although looking at tlb_remove_tlb_entry() I see that
it just maintains a range between the lowest and highest address, so this won't
actually make any difference.
So, yes, this will be a nice improvement, and also prevent the O(n^2) pte reads
for the contpte case. I'll change in the next version.