Re: powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures

From: Arnd Bergmann
Date: Thu Aug 04 2016 - 08:11:11 EST


On Thursday, August 4, 2016 9:47:13 PM CEST Nicholas Piggin wrote:
> On Thu, 04 Aug 2016 12:37:41 +0200 Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > On Thursday, August 4, 2016 11:00:49 AM CEST Arnd Bergmann wrote:
> > > I tried this
> > >
> > > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > > index b5e40ed86e60..89bca1a25916 100755
> > > --- a/scripts/link-vmlinux.sh
> > > +++ b/scripts/link-vmlinux.sh
> > > @@ -44,7 +44,7 @@ modpost_link()
> > > local objects
> > >
> > > if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
> > > - objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
> > > + objects="${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
> > > else
> > > objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
> > > fi
> > >
> > > but that did not seem to change anything, the extra symbols are
> > > still there. I have not tried to understand what that actually
> > > does, so maybe I misunderstood your suggestion.
> > >
> >
> > On a second attempt, I did the same change for vmlinux instead of the
> > module (d'oh), and got a link failure instead:
> >
> >
> > arch/arm/mm/proc-xscale.o: In function `cpu_xscale_do_resume':
> > (.text+0x3d4): undefined reference to `cpu_resume_mmu'
> > arch/arm/kernel/setup.o: In function `setup_arch':
> > ...
> >
> > However, I also see a link failure in some rare configurations
> > with just your patch:
> >
> > arch/arm/lib/lib.a(io-acorn.o): In function `outsl':
> > (.text+0x38): undefined reference to `printk'
> >
> > The problem being a file in a library object that is not referenced,
> > but that references another symbol that is not defined
> > (CONFIG_PRINTK=n).
>
> The first problem is the existing link system is buggy. I think an
> unconditional switch to --whole-archive (at least for modular kernels)
> should probably be done anyway. For example, on powerpc when building
> with --whole-archive, I have:
>
> +dma_noop_alloc
> +dma_noop_free
> +dma_noop_map_page
> +dma_noop_mapping_error
> +dma_noop_map_sg
> +dma_noop_ops
> +dma_noop_supported
> +fdt_add_reservemap_entry
> +fdt_begin_node
> +fdt_create
> +fdt_create_empty_tree
> +fdt_end_node
> +fdt_errtable
> +find_cpio_data
> +ioremap_page_range
>
> find_cpio_data is unnecessary and it's a codesize regression to link it.
> But dma_noop_ops and ioremap_page_range are exported symbols. If I
> reference dma_noop_ops from some random module with otherwise unpatched
> kernel:
>
> ERROR: "dma_noop_ops" [drivers/char/bsr.ko] undefined!

Right, but only on s390, which is the one architecture using this.
I think we should just have a Kconfig symbol for this file that
gets selected by any architecture that needs it.

This is also what we have ended up doing for almost all other
files in lib/

> The real problem is that our linkage requirements are like a shared
> library when we build modular.
>
> We could build a list of exports and make it link objects with those
> symbols, to solve this, but IMO that's just wasting lipstick on a pig.
> But I will to propose a patch to always use --whole-archive, thin
> archives or not, and transition all archs over to it in a few release
> cycles. It just works by luck right now.
>
> Why is it a pig? Because having the linker to notice no external
> references and just skipping the .o completely is trying to use a hammer
> as a scalpel. It's just not a very effective way to eliminate dead code
> -- I pulled in only a handful of unneeded functions by switching it.

If we do that, we may just as well get rid of $(lib-y) in the process and
always use $(obj-y).

> I mean it is a quick simple feature that probably works well enough with
> simple build systems. But not an advanced one that builds almost
> everything on demand and also has loadable modules and must act like a
> shared library.
>
> Real linker DCE is a valid optimisation that can't be replaced by the
> build system of course, but we need to do it properly. Here's what I'm
> working on.
>
> It applies on top of the previous patch I sent, plus some powerpc stuff
> I'm working on that you should be able to just ignore for another arch.
> it's a WIP, but if you can see if it works for arm that would be cool.
>
> It doesn't actually build allyesconfig after this,
> ld: .tmp_vmlinux1: Too many sections: 220655 (>= 65280)
>
> But on a more reasonable configuration (ppc64le)
> text data bss dec filename
> 11191672 1183536 1923820 14299028 vmlinux
> 10625528 861895 1919707 13407130 vmlinux.thin+gc
>
> 10M-552K 1M-314K ~ 13M-870K

Nice!

> And it actually boots too, which is fairly astounding considering that
> it lost half a meg of code and 1/3 of its data. I'm not completely sure
> I've not done something wrong...

Nicolas Pitre has done some related work, adding him to Cc. IIRC we have
actually had multiple implementations of -ffunction-sections/--gc-sections
in the past that people have used in production, but none of them
ever made it upstream.

One question is whether we should bother with --gc-sections at all,
or use full LTO instead.

Arnd
---
(full patch quoted below for Nico, no further comments)

> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index e75e17c..1594072 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -104,6 +104,10 @@ LDFLAGS_vmlinux := $(LDFLAGS_vmlinux-y)
> LDFLAGS_vmlinux += --emit-relocs
> KBUILD_LDFLAGS_MODULE += --emit-relocs
>
> +KBUILD_CFLAGS += -ffunction-sections -fdata-sections
> +LDFLAGS_vmlinux += --gc-sections
> +
> +
> ifeq ($(CONFIG_PPC64),y)
> ifeq ($(call cc-option-yn,-mcmodel=medium),y)
> # -mcmodel=medium breaks modules because it uses 32bit offsets from
> @@ -234,6 +238,8 @@ KBUILD_CFLAGS += $(cpu-as-y)
> archscripts: scripts_basic
> $(Q)$(MAKE) $(build)=arch/powerpc/tools
>
> +CFLAGS_head_$(CONFIG_WORD_SIZE).o = -fno-function-sections
> +
> head-y := arch/powerpc/kernel/head_$(CONFIG_WORD_SIZE).o
> head-$(CONFIG_8xx) := arch/powerpc/kernel/head_8xx.o
> head-$(CONFIG_40x) := arch/powerpc/kernel/head_40x.o
> @@ -245,6 +251,7 @@ head-$(CONFIG_PPC_FPU) += arch/powerpc/kernel/fpu.o
> head-$(CONFIG_ALTIVEC) += arch/powerpc/kernel/vector.o
> head-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += arch/powerpc/kernel/prom_init.o
>
> +
> core-y += arch/powerpc/kernel/ \
> arch/powerpc/mm/ \
> arch/powerpc/lib/ \
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 2da380f..b356e59 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -4,7 +4,10 @@
>
> CFLAGS_ptrace.o += -DUTS_MACHINE='"$(UTS_MACHINE)"'
>
> +ccflags-y += -fno-function-sections -fno-data-sections
> +
> subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
> +subdir-ccflags-y += -fno-function-sections -fno-data-sections
>
> ifeq ($(CONFIG_PPC64),y)
> CFLAGS_prom_init.o += $(NO_MINIMAL_TOC)
> diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
> index 959c131..0856d62 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -56,16 +56,16 @@ SECTIONS
> * in order to optimize stub generation.
> */
> .head.text : AT(ADDR(.head.text) - LOAD_OFFSET) {
> - *(.head.text.first_256B);
> + KEEP(*(.head.text.first_256B));
> #ifndef CONFIG_PPC_BOOK3S
> . = 0x100;
> #else
> - *(.head.text.real_vectors);
> - *(.head.text.real_trampolines);
> - *(.head.text.virt_vectors);
> - *(.head.text.virt_trampolines);
> + KEEP(*(.head.text.real_vectors));
> + KEEP(*(.head.text.real_trampolines));
> + KEEP(*(.head.text.virt_vectors));
> + KEEP(*(.head.text.virt_trampolines));
> #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
> - *(.head.data.fwnmi_page);
> + KEEP(*(.head.data.fwnmi_page));
> . = 0x8000;
> #else
> . = 0x7000;
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 6a67ab9..3a35719 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -312,76 +312,76 @@
> /* Kernel symbol table: Normal symbols */ \
> __ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___ksymtab) = .; \
> - *(SORT(___ksymtab+*)) \
> + KEEP(*(SORT(___ksymtab+*))) \
> VMLINUX_SYMBOL(__stop___ksymtab) = .; \
> } \
> \
> /* Kernel symbol table: GPL-only symbols */ \
> __ksymtab_gpl : AT(ADDR(__ksymtab_gpl) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___ksymtab_gpl) = .; \
> - *(SORT(___ksymtab_gpl+*)) \
> + KEEP(*(SORT(___ksymtab_gpl+*))) \
> VMLINUX_SYMBOL(__stop___ksymtab_gpl) = .; \
> } \
> \
> /* Kernel symbol table: Normal unused symbols */ \
> __ksymtab_unused : AT(ADDR(__ksymtab_unused) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___ksymtab_unused) = .; \
> - *(SORT(___ksymtab_unused+*)) \
> + KEEP(*(SORT(___ksymtab_unused+*))) \
> VMLINUX_SYMBOL(__stop___ksymtab_unused) = .; \
> } \
> \
> /* Kernel symbol table: GPL-only unused symbols */ \
> __ksymtab_unused_gpl : AT(ADDR(__ksymtab_unused_gpl) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___ksymtab_unused_gpl) = .; \
> - *(SORT(___ksymtab_unused_gpl+*)) \
> + KEEP(*(SORT(___ksymtab_unused_gpl+*))) \
> VMLINUX_SYMBOL(__stop___ksymtab_unused_gpl) = .; \
> } \
> \
> /* Kernel symbol table: GPL-future-only symbols */ \
> __ksymtab_gpl_future : AT(ADDR(__ksymtab_gpl_future) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___ksymtab_gpl_future) = .; \
> - *(SORT(___ksymtab_gpl_future+*)) \
> + KEEP(*(SORT(___ksymtab_gpl_future+*))) \
> VMLINUX_SYMBOL(__stop___ksymtab_gpl_future) = .; \
> } \
> \
> /* Kernel symbol table: Normal symbols */ \
> __kcrctab : AT(ADDR(__kcrctab) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___kcrctab) = .; \
> - *(SORT(___kcrctab+*)) \
> + KEEP(*(SORT(___kcrctab+*))) \
> VMLINUX_SYMBOL(__stop___kcrctab) = .; \
> } \
> \
> /* Kernel symbol table: GPL-only symbols */ \
> __kcrctab_gpl : AT(ADDR(__kcrctab_gpl) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___kcrctab_gpl) = .; \
> - *(SORT(___kcrctab_gpl+*)) \
> + KEEP(*(SORT(___kcrctab_gpl+*))) \
> VMLINUX_SYMBOL(__stop___kcrctab_gpl) = .; \
> } \
> \
> /* Kernel symbol table: Normal unused symbols */ \
> __kcrctab_unused : AT(ADDR(__kcrctab_unused) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___kcrctab_unused) = .; \
> - *(SORT(___kcrctab_unused+*)) \
> + KEEP(*(SORT(___kcrctab_unused+*))) \
> VMLINUX_SYMBOL(__stop___kcrctab_unused) = .; \
> } \
> \
> /* Kernel symbol table: GPL-only unused symbols */ \
> __kcrctab_unused_gpl : AT(ADDR(__kcrctab_unused_gpl) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___kcrctab_unused_gpl) = .; \
> - *(SORT(___kcrctab_unused_gpl+*)) \
> + KEEP(*(SORT(___kcrctab_unused_gpl+*))) \
> VMLINUX_SYMBOL(__stop___kcrctab_unused_gpl) = .; \
> } \
> \
> /* Kernel symbol table: GPL-future-only symbols */ \
> __kcrctab_gpl_future : AT(ADDR(__kcrctab_gpl_future) - LOAD_OFFSET) { \
> VMLINUX_SYMBOL(__start___kcrctab_gpl_future) = .; \
> - *(SORT(___kcrctab_gpl_future+*)) \
> + KEEP(*(SORT(___kcrctab_gpl_future+*))) \
> VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .; \
> } \
> \
> /* Kernel symbol table: strings */ \
> __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) { \
> - *(__ksymtab_strings) \
> + KEEP(*(__ksymtab_strings)) \
> } \
> \
> /* __*init sections */ \
> @@ -519,6 +519,7 @@
>
> /* init and exit section handling */
> #define INIT_DATA \
> + KEEP(*(SORT(___kentry+*))) \
> *(.init.data) \
> MEM_DISCARD(init.data) \
> KERNEL_CTORS() \
> @@ -695,9 +696,9 @@
> #define INIT_RAM_FS \
> . = ALIGN(4); \
> VMLINUX_SYMBOL(__initramfs_start) = .; \
> - *(.init.ramfs) \
> + KEEP(*(.init.ramfs)) \
> . = ALIGN(8); \
> - *(.init.ramfs.info)
> + KEEP(*(.init.ramfs.info))
> #else
> #define INIT_RAM_FS
> #endif
> diff --git a/include/linux/export.h b/include/linux/export.h
> index 2f9ccbe..a921862 100644
> --- a/include/linux/export.h
> +++ b/include/linux/export.h
> @@ -46,7 +46,7 @@ extern struct module __this_module;
> extern __visible void *__crc_##sym __attribute__((weak)); \
> static const unsigned long __kcrctab_##sym \
> __used \
> - __attribute__((section("___kcrctab" sec "+" #sym), unused)) \
> + __attribute__((section("___kcrctab" sec "+" #sym ",\"a\",@note #"), used)) \
> = (unsigned long) &__crc_##sym;
> #else
> #define __CRC_SYMBOL(sym, sec)
> @@ -57,12 +57,12 @@ extern struct module __this_module;
> extern typeof(sym) sym; \
> __CRC_SYMBOL(sym, sec) \
> static const char __kstrtab_##sym[] \
> - __attribute__((section("__ksymtab_strings"), aligned(1))) \
> + __attribute__((section("__ksymtab_strings" ",\"a\",@note #"), aligned(1))) \
> = VMLINUX_SYMBOL_STR(sym); \
> extern const struct kernel_symbol __ksymtab_##sym; \
> __visible const struct kernel_symbol __ksymtab_##sym \
> __used \
> - __attribute__((section("___ksymtab" sec "+" #sym), unused)) \
> + __attribute__((section("___ksymtab" sec "+" #sym ",\"a\",@note #"), used)) \
> = { (unsigned long)&sym, __kstrtab_##sym }
>
> #if defined(__KSYM_DEPS__)
> diff --git a/include/linux/init.h b/include/linux/init.h
> index aedb254..51393f4 100644
> --- a/include/linux/init.h
> +++ b/include/linux/init.h
> @@ -156,19 +156,20 @@ extern bool initcall_debug;
>
> #ifndef __ASSEMBLY__
>
> -#ifdef CONFIG_LTO
> +#if 1
> /* Work around a LTO gcc problem: when there is no reference to a variable
> * in a module it will be moved to the end of the program. This causes
> * reordering of initcalls which the kernel does not like.
> * Add a dummy reference function to avoid this. The function is
> * deleted by the linker.
> */
> -#define LTO_REFERENCE_INITCALL(x) \
> - ; /* yes this is needed */ \
> - static __used __exit void *reference_##x(void) \
> - { \
> - return &x; \
> - }
> +#define LTO_REFERENCE_INITCALL(sym) \
> + extern typeof(sym) sym; \
> + /* extern const unsigned long __kentry_##sym; */ \
> + static /* __visible */ const unsigned long __kentry_##sym \
> + __used \
> + __attribute__((section("___kentry" "+" #sym ",\"a\",@note #"), used)) \
> + = (unsigned long)&sym;
> #else
> #define LTO_REFERENCE_INITCALL(x)
> #endif
> @@ -222,16 +223,18 @@ extern bool initcall_debug;
>
> #define __initcall(fn) device_initcall(fn)
>
> -#define __exitcall(fn) \
> - static exitcall_t __exitcall_##fn __exit_call = fn
> +#define __exitcall(fn) \
> + static exitcall_t __exitcall_##fn __exit_call = fn; \
>
> -#define console_initcall(fn) \
> - static initcall_t __initcall_##fn \
> - __used __section(.con_initcall.init) = fn
> +#define console_initcall(fn) \
> + static initcall_t __initcall_##fn \
> + __used __section(.con_initcall.init) = fn; \
> + LTO_REFERENCE_INITCALL(__initcall_##fn)
>
> -#define security_initcall(fn) \
> - static initcall_t __initcall_##fn \
> - __used __section(.security_initcall.init) = fn
> +#define security_initcall(fn) \
> + static initcall_t __initcall_##fn \
> + __used __section(.security_initcall.init) = fn; \
> + LTO_REFERENCE_INITCALL(__initcall_##fn)
>
> struct obs_kernel_param {
> const char *str;
> diff --git a/init/Makefile b/init/Makefile
> index 7bc47ee..c4fb455 100644
> --- a/init/Makefile
> +++ b/init/Makefile
> @@ -2,6 +2,8 @@
> # Makefile for the linux kernel.
> #
>
> +ccflags-y := -fno-function-sections -fno-data-sections
> +
> obj-y := main.o version.o mounts.o
> ifneq ($(CONFIG_BLK_DEV_INITRD),y)
> obj-y += noinitramfs.o
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index ef4658f..fb848af 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -37,17 +37,22 @@ info()
> fi
> }
>
> +# Grab all the EXPORT_SYMBOL symbols in the vmlinux build
> +# ${1} - output file
> +exports_extract()
> +{
> + ${NM} -g ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} |
> + grep "R __ksymtab_" |
> + sed 's/.*__ksymtab_\(.*\)$/\1/' > ${1}
> +}
> +
> # Link of vmlinux.o used for section mismatch analysis
> # ${1} output file
> modpost_link()
> {
> local objects
>
> - if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
> - objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
> - else
> - objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
> - fi
> + objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
> ${LD} ${LDFLAGS} -r -o ${1} ${objects}
> }
>
> @@ -60,11 +65,7 @@ vmlinux_link()
> local objects
>
> if [ "${SRCARCH}" != "um" ]; then
> - if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
> - objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
> - else
> - objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
> - fi
> + objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
> ${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \
> -T ${lds} ${objects} ${1}
> else
>