Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young()

From: Sean Christopherson
Date: Thu Feb 23 2023 - 12:43:38 EST


On Thu, Feb 16, 2023, Yu Zhao wrote:
> An existing selftest can quickly demonstrate the effectiveness of this
> patch. On a generic workstation equipped with 128 CPUs and 256GB DRAM:

Not my area of maintenance, but a non-existent changelog (for all intents and
purposes) for a change of this size and complexity is not acceptable.

> $ sudo max_guest_memory_test -c 64 -m 250 -s 250
>
> MGLRU run2
> ---------------
> Before ~600s
> After ~50s
> Off ~250s
>
> kswapd (MGLRU before)
> 100.00% balance_pgdat
> 100.00% shrink_node
> 100.00% shrink_one
> 99.97% try_to_shrink_lruvec
> 99.06% evict_folios
> 97.41% shrink_folio_list
> 31.33% folio_referenced
> 31.06% rmap_walk_file
> 30.89% folio_referenced_one
> 20.83% __mmu_notifier_clear_flush_young
> 20.54% kvm_mmu_notifier_clear_flush_young
> => 19.34% _raw_write_lock
>
> kswapd (MGLRU after)
> 100.00% balance_pgdat
> 100.00% shrink_node
> 100.00% shrink_one
> 99.97% try_to_shrink_lruvec
> 99.51% evict_folios
> 71.70% shrink_folio_list
> 7.08% folio_referenced
> 6.78% rmap_walk_file
> 6.72% folio_referenced_one
> 5.60% lru_gen_look_around
> => 1.53% __mmu_notifier_test_clear_young

Do you happen to know how much of the improvement is due to batching, and how
much is due to using a walkless walk?

> @@ -5699,6 +5797,9 @@ static ssize_t show_enabled(struct kobject *kobj, struct kobj_attribute *attr, c
> if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG))
> caps |= BIT(LRU_GEN_NONLEAF_YOUNG);
>
> + if (kvm_arch_has_test_clear_young() && get_cap(LRU_GEN_SPTE_WALK))
> + caps |= BIT(LRU_GEN_SPTE_WALK);

As alluded to in patch 1, unless batching the walks even if KVM does _not_ support
a lockless walk is somehow _worse_ than using the existing mmu_notifier_clear_flush_young(),
I think batching the calls should be conditional only on LRU_GEN_SPTE_WALK. Or
if we want to avoid batching when there are no mmu_notifier listeners, probe
mmu_notifiers. But don't call into KVM directly.