Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

From: Michael Matz
Date: Mon Sep 12 2022 - 10:17:50 EST


Hey,

On Mon, 12 Sep 2022, Borislav Petkov wrote:

> Micha, any opinions on the below are appreciated.
>
> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:

> > difficult to ensure correctness. Also, due to kernel live patching, the
> > kernel relies on 100% correctness of unwinding metadata, whereas the
> > toolchain treats it as a best effort.

Unwinding certainly is not best effort. It's 100% reliable as far as the
source language or compilation options require. But as it doesn't
touch the discussed features I won't belabor that point.

I will mention that objtool's existence is based on mistrust, of persons
(not correctly annotating stuff) and of tools (not correctly heeding those
annotations). The mistrust in persons is understandable and can be dealt
with by tools, but the mistrust in tools can't be fixed by making tools
more complicated by emitting even more information; there's no good reason
to assume that one piece of info can be trusted more than other pieces.
So, if you mistrust the tools you have already lost. That's somewhat
philosophical, so I won't beat that horse much more either.

Now, recovering the CFG. I'll switch order of your two items:

2) noreturn function

> > .pushsection .annotate.noreturn
> > .quad func1
> > .quad func2
> > .quad func3
> > .popsection

This won't work for indirect calls to noreturn functions:

void (* __attribute__((noreturn)) noretptr)(void);
int callnoret (int i)
{
noretptr();
return i + 32;
}

The return statement is unreachable (and removed by GCC). To know that
you would have to mark the call statements, not the individual functions.
All schemes that mark functions that somehow indicates a meaningful
difference in the calling sequence (e.g. the ABI of functions) have the
same problem: it's part of the call expressions type, not of individual
decls.

Second problem: it's not extensible. Today it's noreturn functions you
want to know, and tomorrow? So, add a flag word per entry, define bit 0
for now to be NORETURN, and see what comes. Add a header with a version
(and/or identifier) as well and it's properly extensible. For easy
linking and identifying the blobs in the linked result include a length in
the header. If this were in an allocated section it would be a good idea
to refer to the symbols in a PC-relative manner, so as to not result in
runtime relocations. In this case, as it's within a non-alloc section
that doesn't matter. So:

.section .annotate.functions
.long 1 # version
.long 0xcafe # ident
.long 2f-1f # length
1:
.quad func1, 1 # noreturn
.quad func2, 1 # noreturn
.quad func3, 32 # something_else_but_not_noreturn
...
2:
.long 1b-2b # align and "checksum"

It might be that the length-and-header scheme is cumbersome if you need to
write those section commands by hand, in which case another scheme might
be preferrable, but it should somehow be self-delimiting.

For the above problem of indirect calls to noreturns, instead do:

.text
noretcalllabel:
call noreturn
othercall:
call really_special_thing
.section .annotate.noretcalls
.quad noretcalllabel, 1 # noreturn call
.quad othercall, 32 # call to some special(-ABI?) function

Same thoughts re extensibility and self-delimitation apply.

1) jump tables

> > Create an .annotate.jump_table section which is an array of the
> > following variable-length structure:
> >
> > struct annotate_jump_table {
> > void *indirect_jmp;
> > long num_targets;
> > void *targets[];
> > };

It's very often the case that the compiler already emits what your
.targets[] member would encode, just at some unknown place, length and
encoding. So you would save space if you instead only remember the
encoding and places of those jump tables:

struct {
void *indirect_jump;
long num_tables;
struct {
unsigned num_entries;
unsigned encoding;
void *start_of_table;
} tables[];
};

The usual encodings are: direct, PC-relative, relative-to-start-of-table.
Usually for a specific jump instruction there's only one table, so
optimizing for that makes sense. For strange unthought-of cases it's
probably a good idea to have your initial scheme as fallback, which could
be indicated by a special .encoding value.

> > For example, given the following switch statement code:
> >
> > .Lswitch_jmp:
> > // %rax is .Lcase_1 or .Lcase_2
> > jmp %rax

So, usually %rax would point into a table (somewhere in .rodata/.text)
that looks like so:

.Ljump_table:
.quad .Lcase_1 - .Ljump_table
.quad .Lcase_2 - .Ljump_table

(for position-independend code)

and hence you would emit this as annotation:

.quad .Lswitch_jmp
.quad 1 # only a single table
.long 2 # with two entries
.long RELATIVE_TO_START # all entries are X - start_of_table
.quad .Ljump_table

In this case you won't save anything of course, but as soon as there's a
meaningful number of cases you will.

Again, if that info would be put into an allocated section you would want
to use relative encodings of the addresses to avoid runtime relocs. And
the remarks about self-delimitation and extensibility also apply here.

> > Alternatives
> > ------------
> >
> > Another idea which has been floated in the past is for objtool to read
> > DWARF (or .eh_frame) to help it figure out the control flow. That
> > hasn't been tried yet, but would be considerably more difficult and
> > fragile IMO.

While noreturn functions are marked in the debug info, noreturn
function types currently aren't quite correct. And jump-tables aren't
marked at all, so that would lose.


Ciao,
Michael.