Re: [PATCH v1] arch: Enable function alignment for arm64

From: Mina Almasry
Date: Wed Feb 22 2023 - 17:09:11 EST


On Tue, Jan 24, 2023 at 4:09 AM Will Deacon <will@xxxxxxxxxx> wrote:
>
> On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote:
> > We recently ran into a double-digit percentage hackbench regression
> > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock
> > before decrementing h->resv_huge_pages") to an older kernel. This was
> > surprising since hackbench does use hugetlb pages at all and the
> > modified code is not invoked. After some debugging we found that the
> > regression can be fixed by back-porting commit d49a0626216b ("arch:
> > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment
> > for arm64. I suggest enabling it by default for arm64 if possible.
> >
> > Tested by examing function alignment on a compiled object file
> > before/after:
> >
> > After this patch:
> >
> > $ ~/is-aligned.sh mm/hugetlb.o 16
> > file=mm/hugetlb.o, alignment=16
> > total number of functions: 146
> > total number of unaligned: 0
> >
> > Before this patch:
> >
> > $ ~/is-aligned.sh mm/hugetlb.o 16
> > file=mm/hugetlb.o, alignment=16
> > total number of functions: 146
> > total number of unaligned: 94
> >
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > ---
> > arch/arm64/Kconfig | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index cf6d1cd8b6dc..bcc9e1578937 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -235,6 +235,7 @@ config ARM64
> > select TRACE_IRQFLAGS_SUPPORT
> > select TRACE_IRQFLAGS_NMI_SUPPORT
> > select HAVE_SOFTIRQ_ON_OWN_STACK
> > + select FUNCTION_ALIGNMENT_16B
> > help
> > ARM 64-bit (AArch64) Linux support.
>
> This increases the size of .text for a defconfig build by ~2%, so I think it
> would be nice to have some real numbers for the performance uplift. Are you
> able to elaborate beyond "double-digit percentage hackbench regression"?
>

(Sorry for the late reply)

The full story is already in the commit message. The only details I
omitted are the exact regression numbers we saw:

-16% on hackbench_process_pipes_234 (which should be `hackbench -pipe
234 process 1000`)
-23% on hackbench_process_sockets_234 (which should be `hackbnech 234
process 1000`)

Like the commit message says it doesn't make much sense that
cherry-picking 12df140f0bdf ("mm,hugetlb: take hugetlb_lock before
decrementing h->resv_huge_pages") to our kernel causes such a huge
regression, because hackbench doesn't use hugetlb at all.

> In general, however, I'm supportive of the patch (and it seems that x86
> does the same thing) so:
>
> Acked-by: Will Deacon <will@xxxxxxxxxx>
>
> Will