Re: [PATCH 7/7] DWARF: add the config option

From: Jiri Slaby
Date: Wed May 10 2017 - 03:39:47 EST


On 05/05/2017, 09:57 PM, Linus Torvalds wrote:
> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby <jslaby@xxxxxxx> wrote:
>> The DWARF unwinder is in place and ready. So introduce the config option
>> to allow users to enable it. It is by default off due to missing
>> assembly annotations.
>
> Who actually ends up using this?

Every SUSE user has been using this for almost a decade and we are not
about to switch to FP for performance reasons as noted by Jiri Kosina.
So SUSE users are going to be exposed to DWARF unwinder for another
decade or so at least.

Therefore, this is another attempt to make the unwinder (in some form)
upstream. Since this was first proposed many years ago, we have been
forced to forward-port it over and over and everyone knows what pain it
is. So it is nice, that this opened the discussion at least.

> Because from the last time we had fancy unwindoers, and all the
> problems it caused for oops handling with absolutely _zero_ upsides
> ever, I do not ever again want to see fancy unwinders with complex
> state machine handling used by the oopsing code.

Well, reliable stack-traces with minimal performance impact thanks to
out-of-band data is hell good reason in my opinion.

> The fact that it gets disabled for KASAN also makes me suspicious. It
> basically means that now all the accesses it does are not even
> validated.

OK, I inclined to disable KASAN when I started cleaning this up for
_performance_ reasons. The system was so slow, that the RCU stall or
soft-lockup detectors came up complaining. From that time, I measured
the bottlenecks and optimized the unwinder so that 1000 iterations of
unwinding takes:

Before:
real 0m1.808s
user 0m0.001s
sys 0m1.807s

After:
real 0m0.018s
user 0m0.001s
sys 0m0.017s

So let me check, whether KASAN still has to be disabled globally. I do
not think so.

OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as
holds now for the rest of the current unwinding:
KASAN_SANITIZE_dumpstack.o := n
KASAN_SANITIZE_dumpstack_$(BITS).o := n
KASAN_SANITIZE_stacktrace.o := n

Still, I can let KASAN := y for dwarf.c for testing purposes locally and
smoke-test the unwinder.

> The fact that the most of the code seems to be disabled for the first
> six patches, and then just enabled in the last patch, also seems to
> mean that the series also gets no bisection coverage or testing that
> the individual patches make any sense. (ie there's a lot of code
> inside "CONFIG_DWARF_UNWIND" in the early patches but that config
> option cannot even be enabled until the last patch).

Correct. This was one big patch previously. I separated that patch into
several smaller commits touching different places of the kernel for
easier review.

It does not make sense to test any of the patches separately except the
first. Hence the config option which enables the rest of the series is
the last one. I deemed this as one of possible approaches to split
patches (I have seen this many times in the past.) I can of course
squash this back into a single patch (or two).

> We used to have nasty issues with not just missing dwarf info, but
> also actively *wrong* dwarf info. Compiler versions that generated
> subtly wrong info, because nobody actually really depended on it, and
> the people who had tested it seldom did the kinds of things we do in
> the kernel (eg inline asms etc).

I must admit I am not aware of any issues in this manner during the
years. Again, this unwinder is the default in SUSE kernels since ever,
so we have been using gcc from at least 3.2 to 7. But see below.

> So I'm personally still very suspicious of these things.
>
> Last time I had huge issues with people also always blaming *anything*
> else than that unwinder. It was always "oh, somebody wrote asm without
> getting it right". Or "oh, the compiler generated bad tables, it's not
> *my* fault that now the kernel oopsing code no longer works".

Now we have objtool. My objtool clone:
1) verifies the DWARF data (prepared by Josh)

2) generates DWARF data for assembly -- incomplete yet: see the thread
about x86 assembly cleanup which is a pre-requisite for this to work.
This is BTW the reason why the DWARF unwinder is default-off in this
series yet.

And we can add:
3) fix up the data, if they are wrong

That said, objtool could handle the data so they are correct and
as-expected for the unwinder. Without objtool, the data (and unwinder)
is hopeless (only vdso from all the assembly is annotated.)

> When I asked for more stricter debug table validation to avoid issues,
> it was always "oh, we fixed it, no worries", and then two months later
> somebody hit another issue.

Reasonable, indeed. I am all for strict checking. objtool is to do that.

> Put another way; the last time we did crazy stuff like this, it got
> reverted. For a damn good reason, despite some people being in denial
> about those reasons.

Speaking for myself, having it out-of-tree causes me only troubles with
fwd-porting. So I am all ears to find a path to make this upstream and
maintain this there according to opinions of general kernel-community.
(Which reminds me I didn't add an entry to the MAINTAINERS file.)

thanks,
--
js
suse labs