Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

From: Fangrui Song
Date: Mon Feb 24 2020 - 16:28:34 EST


On 2020-02-24, Nick Desaulniers wrote:
On Mon, Feb 24, 2020 at 5:28 AM Michael Matz <matz@xxxxxxx> wrote:

Hello,

On Sat, 22 Feb 2020, Nick Desaulniers wrote:

> > > > In GNU ld, it seems that .shstrtab .symtab and .strtab are special
> > > > cased. Neither the input section description *(.shstrtab) nor *(*)
> > > > discards .shstrtab . I feel that this is a weird case (probably even a bug)
> > > > that lld should not implement.
> > >
> > > Ok, forget what the tools do for a second: why is .shstrtab special and
> > > why would one want to keep it?
> > >
> > > Because one still wants to know what the section names of an object are
> > > or other tools need it or why?
> > >
> > > Thx.
> > >
> > > --
> > > Regards/Gruss,
> > > Boris.
> > >
> > > https://people.kernel.org/tglx/notes-about-netiquette
> >
> > .shstrtab is required by the ELF specification. The e_shstrndx field in
> > the ELF header is the index of .shstrtab, and each section in the
> > section table is required to have an sh_name that points into the
> > .shstrtab.
>
> Yeah, I can see it both ways. That `*` doesn't glob all remaining
> sections is surprising to me, but bfd seems to be "extra helpful" in
> not discarding sections that are required via ELF spec.

In a way the /DISCARD/ assignment should be thought of as applying to
_input_ sections (as all such section references on the RHS), not
necessarily to output sections. What this then means for sections that
are synthesized by the link editor is less clear. Some of them are
generated regardless (as you noted, e.g. the symbol table and associated
string sections, including section name string table), some of them are
suppressed, and either lead to an followup error (e.g. with .gnu.hash), or
to invalid output (e.g. missing .dynsym for executables simply lead to
segfaults when running them).

Hi Michael, please see my other reply on this thread: https://lkml.org/lkml/2020/2/24/47

Synthesized sections can be matched as well. For example, SECTIONS { .pltfoo : { *(.plt) }} can rename the output section .plt to .pltfoo
It seems that in GNU ld, the synthesized section is associated with the
original object file, so it can be written as:

SECTIONS { .pltfoo : { a.o(.plt) }}

In lld, you need a wildcard to match the synthesized section *(.plt)

.rela.dyn is another example.

That's the reason for the perceived inconsistency with behaviour on '*':
it's application to synthesized sections. Arguably bfd should be fixed to
also not discard the other essential sections (or alternatively to give an
error when an essential section is discarded). The lld behaviour of e.g.
discarding .shstrtab (or other synthesized sections necessary for valid
ELF output) doesn't make much sense either, though.

I think most input section descriptions *(*) are misuse. They really
should be INPUT_SECTION_FLAGS(SHF_ALLOC) *(*)

Hi Michael, thank you for the precise feedback. Do you have a list of
"synthesized sections necessary for valid ELF output?" Also, could you
point me to the documentation about `*` and its relation to
"synthesized sections necessary for valid ELF output?" This will help
me file a precise bug against LLD.

https://sourceware.org/binutils/docs/ld/Output-Section-Discarding.html#Output-Section-Discarding

has a few words on this topic. A large part is implementation defined.
In GNU ld, the implementation is mostly in ld/ldlang.c and ld/ldexp.c
(very long).