RE: [PATCH v1] arch: Enable function alignment for arm64

From: David Laight
Date: Wed Feb 22 2023 - 17:42:45 EST


From: Mina Almasry
> Sent: 22 February 2023 22:16
>
> On Wed, Jan 25, 2023 at 3:16 AM David Laight <David.Laight@xxxxxxxxxx> wrote:
> >
> > From: Will Deacon <will@xxxxxxxxxx>
> > > Sent: 24 January 2023 12:09
> > >
> > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote:
> > > > We recently ran into a double-digit percentage hackbench regression
> > > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock
> > > > before decrementing h->resv_huge_pages") to an older kernel. This was
> > > > surprising since hackbench does use hugetlb pages at all and the
> > > > modified code is not invoked. After some debugging we found that the
> > > > regression can be fixed by back-porting commit d49a0626216b ("arch:
> > > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment
> > > > for arm64. I suggest enabling it by default for arm64 if possible.
> > > >
> > ...
> > >
> > > This increases the size of .text for a defconfig build by ~2%, so I think it
> > > would be nice to have some real numbers for the performance uplift. Are you
> > > able to elaborate beyond "double-digit percentage hackbench regression"?
> > >
> > > In general, however, I'm supportive of the patch (and it seems that x86
> > > does the same thing) so:
> >
> > I bet it just changes the alignment of the code so that more
> > functions are using different cache lines.
> >
> > All sorts of other random changes are likely to have a similar effect.
> >
> > Cache-line aligning the start of a function probably reduces the
> > number of cache lines the functions needs - but that isn't guaranteed.
> > It also slightly reduces the delay on a cache miss - but they are so
> > slow it probably makes almost no difference.
> >
>
> David, my understanding is similar to yours. I.e. without explicit alignment:
>
> 1. Random changes to the code can cause critical path functions to
> become aligned or unaligned which will cause perf
> regressions/improvements.
> 2. Random changes to the code can cause critical path functions to be
> placed near a cache line boundary, causing one more cache line to be
> loaded when they are run, which will cause perf regressions.
>
> So for these very reasons function alignment is a good thing.

Except that aligning functions doesn't necessarily improve things.

Even within a function the alignment of the top of a loop (that
is executed a lot) might matter more than the alignment of the
function itself.

Any change will affect which code 'share' cache lines so can
stop the working set of some test (or code loop with deep
function calls) from fitting in the I-cache so making it
much slower.

Changing the size also affects where the TLB boundaries are
(especially if not using very large pages).
If the number of pages exceeds the number of TLB things
will slow down.

I think one version of gcc used to align most labels.
While the code might be slightly faster, the bloat actually
made it a net loss.

So aligning functions might help, but it might just
make things worse.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)