答复: [PATCH] mm/compaction:let proactive compaction order configurable

From: Chu,Kaiping
Date: Tue Apr 13 2021 - 22:11:58 EST


Hi Rientjes,
In our case we don't care about the allocation delay of transparent huge pages, but the proactive compaction is really useful to us. If no proactive compaction currently kernel will do memory compaction only when the allocation of high order memory will fail, while this is too late. When the machine is in heavy load, many processes maybe trigger compaction at the same time, this will lead to serious lock contention, and will make the machine very slowly.
Do proactive compaction from time to time will keep the fragment index at low level, and reduce soft lockup rate.
The order of 3 or 4 is only an experience value, we may change it according to machine load.

BR,
Chu Kaiping

-----邮件原件-----
发件人: David Rientjes <rientjes@xxxxxxxxxx>
发送时间: 2021年4月13日 2:26
收件人: Chu,Kaiping <chukaiping@xxxxxxxxx>
抄送: mcgrof@xxxxxxxxxx; keescook@xxxxxxxxxxxx; yzaikin@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx
主题: Re: [PATCH] mm/compaction:let proactive compaction order configurable

On Mon, 12 Apr 2021, chukaiping wrote:

> Currently the proactive compaction order is fixed to
> COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
> normal 4KB memory, but it's too high for the machines with small
> normal memory, for example the machines with most memory configured as
> 1GB hugetlbfs huge pages. In these machines the max order of free
> pages is often below 9, and it's always below 9 even with hard
> compaction. This will lead to proactive compaction be triggered very
> frequently. In these machines we only care about order of 3 or 4.
> This patch export the oder to proc and let it configurable by user,
> and the default value is still COMPACTION_HPAGE_ORDER.
>

I'm curious why you have proactive compaction enabled at all in this case?

The order-9 threshold is likely to optimize for hugepage availability, but in your setup it appears that's not a goal.

So what benefit does proactive compaction provide if only done for order-3 or order-4?

> Signed-off-by: chukaiping <chukaiping@xxxxxxxxx>
> ---
> include/linux/compaction.h | 1 +
> kernel/sysctl.c | 10 ++++++++++
> mm/compaction.c | 7 ++++---
> 3 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index ed4070e..151ccd1 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -83,6 +83,7 @@ static inline unsigned long compact_gap(unsigned int
> order) #ifdef CONFIG_COMPACTION extern int sysctl_compact_memory;
> extern unsigned int sysctl_compaction_proactiveness;
> +extern unsigned int sysctl_compaction_order;
> extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> void *buffer, size_t *length, loff_t *ppos); extern int
> sysctl_extfrag_threshold; diff --git a/kernel/sysctl.c
> b/kernel/sysctl.c index 62fbd09..277df31 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -114,6 +114,7 @@
> static int __maybe_unused neg_one = -1; static int __maybe_unused
> two = 2; static int __maybe_unused four = 4;
> +static int __maybe_unused ten = 10;
> static unsigned long zero_ul;
> static unsigned long one_ul = 1;
> static unsigned long long_max = LONG_MAX; @@ -2871,6 +2872,15 @@ int
> proc_do_static_key(struct ctl_table *table, int write,
> .extra2 = &one_hundred,
> },
> {
> + .procname = "compaction_order",
> + .data = &sysctl_compaction_order,
> + .maxlen = sizeof(sysctl_compaction_order),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ZERO,
> + .extra2 = &ten,
> + },
> + {
> .procname = "extfrag_threshold",
> .data = &sysctl_extfrag_threshold,
> .maxlen = sizeof(int),
> diff --git a/mm/compaction.c b/mm/compaction.c index e04f447..a192996
> 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1925,16 +1925,16 @@ static bool kswapd_is_running(pg_data_t
> *pgdat)
>
> /*
> * A zone's fragmentation score is the external fragmentation wrt to
> the
> - * COMPACTION_HPAGE_ORDER. It returns a value in the range [0, 100].
> + * sysctl_compaction_order. It returns a value in the range [0, 100].
> */
> static unsigned int fragmentation_score_zone(struct zone *zone) {
> - return extfrag_for_order(zone, COMPACTION_HPAGE_ORDER);
> + return extfrag_for_order(zone, sysctl_compaction_order);
> }
>
> /*
> * A weighted zone's fragmentation score is the external
> fragmentation
> - * wrt to the COMPACTION_HPAGE_ORDER scaled by the zone's size. It
> + * wrt to the sysctl_compaction_order scaled by the zone's size. It
> * returns a value in the range [0, 100].
> *
> * The scaling factor ensures that proactive compaction focuses on
> larger @@ -2666,6 +2666,7 @@ static void compact_nodes(void)
> * background. It takes values in the range [0, 100].
> */
> unsigned int __read_mostly sysctl_compaction_proactiveness = 20;
> +unsigned int __read_mostly sysctl_compaction_order =
> +COMPACTION_HPAGE_ORDER;
>
> /*
> * This is the entry point for compacting all nodes via
> --
> 1.7.1
>
>