Re: 2.6.17-rc5-mm2

From: Ingo Molnar
Date: Fri Jun 02 2006 - 03:50:33 EST



* Jan Beulich <jbeulich@xxxxxxxxxx> wrote:

> >firstly, i'd suggest to use another magic value for 'bottom of call
> >stacks' - it is way too common to jump or call a NULL pointer. Something
> >like 0xfedcba9876543210 would be better.
>
> That's contrary to common use (outside of the kernel). I'm opposed to
> this. Detecting an initial bad EIP isn't a problem, and the old code
> can be used easily in that case.

but 0 is pretty much the worst choice for something that needs to be
reliable - it's the most common type of machine word in existence,
amongst all the 18446744073709551616 possibilities. And we need not care
about userspace's prior choices, this code and data is totally under the
kernel's control.

> >for the RIP/EIP to get corrupted is a common occurance. So is stack
> >corruption. So the fallback mechanism shouldnt be a 'short while'
> >side-thought, it must be part of the design.
>
> RIP/EIP corruption, as said above, can be easily handled. RSP/ESP
> corruption, as I understand it, isn't being handled in the old code,
> and so I can't see what improvements the new code could do here (given
> that instruction and stack pointers serve as the anchors for kicking
> off an unwind).

i'm not only talking about RSP/ESP corruption, but about stack
corruption. I.e. some area of the stack is corrupted. With the scanning
method we at least get some other entries out - while with the unwind
method we only say 'sorry'.

anyway, i think that handling a bad initial RIP/EIP would be a good
first step and it should solve the problem at hand. (it will also serve
as a basis for whatever other heuristics we might want to apply later
on)

> >In all other cases (if we go outside of the stack page(s)) we _must_
> >fall back to the dump 'scan the stack pages for interesting entries'
> >method, to get the information out! "Uh oh the unwind info somehow got
> >corrupted, sorry" is not enough to debug a kernel bug.
>
> Again, you miss the point that the very last unwind operation must
> always be expected to move the stack pointer outside the stack
> boundaries, which would mean triggering the fallback path in all
> cases. With this, we could as well leave out the entire unwind code
> and keep everyone of us manually do the separation of good and bad
> entries in the trace shown.

no, i dont miss that point at all. What _you_ are missing is the obvious
solution: stacks on x86_64 are already linked to each other, via
fixed-position pointers at the end of the stackpages. So the unwinder
can easily check whether the 'next stack' as suggested by the link at
the end of the page is indeed the same as the unwind jumpout does. If
not => fallback.

same for i386 - there too the stacks are linked via non-unwind data. The
unwinder can do a pretty good verification of the jumpout.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/