Re: [PATCH 0/4] kbuild: build speed improvment of CONFIG_TRIM_UNUSED_KSYMS

From: Nicolas Pitre
Date: Tue Mar 09 2021 - 11:50:15 EST


On Tue, 9 Mar 2021, Masahiro Yamada wrote:

> On Fri, Feb 26, 2021 at 4:24 AM Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
> >
> > If CONFIG_TRIM_UNUSED_KSYMS is enabled then build time willincrease.
> > That comes with the feature.
>
> This patch set intends to change this.
> TRIM_UNUSED_KSYMS will build without additional cost,
> like LD_DEAD_CODE_DATA_ELIMINATION.

OK... I do see how you're going about it.

> > > Modules are relocatable ELF.
> > > Clang LTO cannot eliminate any code.
> > > GCC LTO does not work with relocatable ELF
> > > in the first place.
> >
> > I don't think I follow you here. What relocatable ELF has to do with LTO?
>
> What is important is,
> GCC LTO is the feature of gcc, not binutils.
> That is, LD_FINAL is $(CC).

Exact.

> GCC LTO can be implemented for the final link stage
> by using $(CC) as the linker driver.
> Then, it can determine which code is unreachable.
> In other words, GCC LTO works only when building
> the final executable.

Yes. And it does so by filling .o files with its intermediate code
representation and not ELF code.

> On the other hand, a relocatable ELF is created
> by $(LD) -r by combining some objects together.
> The relocatable ELF can be fed to another $(LD) -r,
> or the final link stage.

You still can create relocatable ELF using LTO. But LTO stops there.
>From that point on, .o files will no longer contain data that LTO can
use if you further combine those object files together. But until that
point, LTO is still usable.

> As I said above, modules are created by $(LD) -r.
> It is not possible to implement GCC LTO for modules.

If I remember correctly (that was a while ago) the problem with LTO and
the kernel had to do with the fact that avery subdirectory was gathering
object files in built-in.o using ld -r. At some point we switched to
gathering object files into built-in.a files where no linking is taking
place. The real linking happens in vmlinux.o where LTO may now do its
magic.

The same is true for modules. Compiling foo_module.c into foo_module.o
will create a .o file with LTO data rather than executable code. But
when you create the final .o for the module then LTO takes place and
produce the relocatable ELF executable.

> > I've successfully used gcc LTO on the kernel quite a while ago.
> >
> > For a reference about binary size reduction with LTO and
> > CONFIG_TRIM_UNUSED_KSYMS please read this article:
> >
> > https://lwn.net/Articles/746780/
>
> Thanks for the great articles.
>
> Just for curiosity, I think you used GCC LTO from
> Andy's GitHub.

Right. I provided the reference in the preceding article:
https://lwn.net/Articles/744507/

> In the article, you took stm32_defconfig as an example,
> but ARM does not select ARCH_SUPPORTS_LTO.
>
> Did you add some local hacks to make LTO work
> for ARM?

Of course. This article was written in 2017 and no LTO support at all
was in mainline back then. But, besides adding CONFIG_LTO, very little
was needed to make it compile, and I did upstream most changes such as
commit 75fea300d7, commit a85b2257a5, commit 5d48417592, commit
19c233b79d, etc.

> I tried the lto-5.8.1 branch, but
> I did not even succeed in building x86 + LTO.

My latest working LTO branch (i.e. last time I worked on it) is much
older than that.

Maybe people aren't very excited about LTO because it makes the time to
recompiling the kernel many times longer because gcc does its
optimization passes on the whole kernel even if you modify a single
file.


Nicolas