Re: [PATCH v2 00/28] Add support for Clang LTO

From: Masahiro Yamada
Date: Sat Sep 05 2020 - 20:28:19 EST


On Fri, Sep 4, 2020 at 5:30 AM Sami Tolvanen <samitolvanen@xxxxxxxxxx> wrote:
>
> This patch series adds support for building x86_64 and arm64 kernels
> with Clang's Link Time Optimization (LTO).
>
> In addition to performance, the primary motivation for LTO is
> to allow Clang's Control-Flow Integrity (CFI) to be used in the
> kernel. Google has shipped millions of Pixel devices running three
> major kernel versions with LTO+CFI since 2018.
>
> Most of the patches are build system changes for handling LLVM
> bitcode, which Clang produces with LTO instead of ELF object files,
> postponing ELF processing until a later stage, and ensuring initcall
> ordering.
>
> Note that patches 1-4 are not directly related to LTO, but are
> needed to compile LTO kernels with ToT Clang, so I'm including them
> in the series for your convenience:
>
> - Patches 1-3 are required for building the kernel with ToT Clang,
> and IAS, and patch 4 is needed to build allmodconfig with LTO.
>
> - Patches 3-4 are already in linux-next, but not yet in 5.9-rc.
>


I still do not understand how this patch set works.
(only me?)

Please let me ask fundamental questions.



I applied this series on top of Linus' tree,
and compiled for ARCH=arm64.

I compared the kernel size with/without LTO.



[1] No LTO (arm64 defconfig, CONFIG_LTO_NONE)

$ llvm-size vmlinux
text data bss dec hex filename
15848692 10099449 493060 26441201 19375f1 vmlinux



[2] Clang LTO (arm64 defconfig + CONFIG_LTO_CLANG)

$ llvm-size vmlinux
text data bss dec hex filename
15906864 10197445 490804 26595113 195cf29 vmlinux


I compared the size of raw binary, arch/arm64/boot/Image.
Its size increased too.



So, in my experiment, enabling CONFIG_LTO_CLANG
increases the kernel size.
Is this correct?


One more thing, could you teach me
how Clang LTO optimizes the code against
relocatable objects?



When I learned Clang LTO first, I read this document:
https://llvm.org/docs/LinkTimeOptimization.html

It is easy to confirm the final executable
does not contain foo2, foo3...



In contrast to userspace programs,
kernel modules are basically relocatable objects.

Does Clang drop unused symbols from relocatable objects?
If so, how?

I implemented an example module (see the attachment),
and checked the symbols.
Nothing was dropped.

The situation is the same for build-in
because LTO is run against vmlinux.o, which is
relocatable as well.


--
Best Regards

Masahiro Yamada
From c1dc646f73bd948edbf0c4a7f1baa93ecf8c208e Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <masahiroy@kernel.org>
Date: Sun, 6 Sep 2020 08:11:32 +0900
Subject: [PATCH] lto: test module

Here is a great example for LTO:
https://llvm.org/docs/LinkTimeOptimization.html

LTO removes foo2() and foo3() from the final executable file, "main".
(and foo4() is also dropped if you pass -flto to main.c)

This patch integrates the example code into a kernel module.

a.c -> kernel/lto-test-a.c
main.c -> kernel/lto-test-main.c

Of course, I replaced printf() with printk().

I applied this test patch on top of Sami's v2:
https://patchwork.kernel.org/project/linux-kbuild/list/?series=343153

I compiled arm64 defconfig + CONFIG_LTO_CLANG.

This is the result:

$ aarch64-linux-gnu-nm kernel/lto-test.ko
0000000000000010 T foo1
0000000000000000 T foo2
000000000000004c T foo4
0000000000000000 B i.llvm.7710645642085602891
0000000000000000 r __kstrtab_lto_test_main
000000000000000e r __kstrtabns_lto_test_main
0000000000000000 r __ksymtab_lto_test_main
0000000000000068 T lto_test_main
0000000000000000 r _note_7
U printk
0000000000000000 R .str.llvm.887650332484512380
0000000000000000 D __this_module
0000000000000063 r __UNIQUE_ID_depends254
000000000000005a r __UNIQUE_ID_intree253
000000000000004c r __UNIQUE_ID_name252
0000000000000000 r __UNIQUE_ID_vermagic251

Modules are relocatable objects, not executables.
How can clang LTO know unreachable symbols are really
unreachable?

According to the result above, foo2 is remaining.

The behavior is the same for obj-y because LTO is run against
vmlinux.o, which is a relocatable ELF.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---
kernel/Makefile | 3 +++
kernel/lto-test-a.c | 22 ++++++++++++++++++++++
kernel/lto-test-a.h | 3 +++
kernel/lto-test-main.c | 12 ++++++++++++
4 files changed, 40 insertions(+)
create mode 100644 kernel/lto-test-a.c
create mode 100644 kernel/lto-test-a.h
create mode 100644 kernel/lto-test-main.c

diff --git a/kernel/Makefile b/kernel/Makefile
index 9a20016d4900..2111251c2093 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -147,3 +147,6 @@ $(obj)/kheaders_data.tar.xz: FORCE
$(call cmd,genikh)

clean-files := kheaders_data.tar.xz kheaders.md5
+
+obj-m += lto-test.o
+lto-test-objs := lto-test-a.o lto-test-main.o
diff --git a/kernel/lto-test-a.c b/kernel/lto-test-a.c
new file mode 100644
index 000000000000..15cdc320ec1e
--- /dev/null
+++ b/kernel/lto-test-a.c
@@ -0,0 +1,22 @@
+#include "lto-test-a.h"
+
+static signed int i = 0;
+
+void foo2(void) {
+ i = -1;
+}
+
+static int foo3(void) {
+ foo4();
+ return 10;
+}
+
+int foo1(void) {
+ int data = 0;
+
+ if (i < 0)
+ data = foo3();
+
+ data = data + 42;
+ return data;
+}
diff --git a/kernel/lto-test-a.h b/kernel/lto-test-a.h
new file mode 100644
index 000000000000..fca4d13a52e0
--- /dev/null
+++ b/kernel/lto-test-a.h
@@ -0,0 +1,3 @@
+extern int foo1(void);
+extern void foo2(void);
+extern void foo4(void);
diff --git a/kernel/lto-test-main.c b/kernel/lto-test-main.c
new file mode 100644
index 000000000000..6e8caa2c7667
--- /dev/null
+++ b/kernel/lto-test-main.c
@@ -0,0 +1,12 @@
+#include <linux/module.h>
+#include <linux/export.h>
+#include "lto-test-a.h"
+
+void foo4(void) {
+ printk("Hi\n");
+}
+
+int lto_test_main(void) {
+ return foo1();
+}
+EXPORT_SYMBOL(lto_test_main);
--
2.25.1