Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

From: Linus Torvalds
Date: Sat Sep 30 2006 - 20:25:49 EST

On Sun, 1 Oct 2006, Andi Kleen wrote:
> On i386 it is simpler (only one interrupt stack and one process stack)
> However there can be still nasty corner cases, like the temporary NMI stacks
> that were added recently. It could be probably all handled in a state machine,
> but it would be ugly (at least I heard complaints about the x86 code that
> does it)

No, I don't understand AT ALL why you are so hung up about this state
machine thing. It's not true. There's simply no state machine involved,
although from a theoretical standpoint you can obviously always implement
just about _anything_ as a state machine.

> > Read the x86 code. Please. The one _before_ you added unwinding.
> It's still the same if you disable unwinding.

I think your problem is that you think that "unwinding" needs to handle
all the page crossing. That's incorrect, and that just results in stupid
and unworkable code.

Instead (and this is why I was trying to point you to the original
pre-unwinding code on i386), what you should do is to see it as two
totally independent phases:

- you have an outer loop that loops around the pages (since the _kernel_
controls the stack nesting at that level). This is the loop I quoted at

- you have a _separate_ "unwinder()" for each page. It only unwinds
within that one page, and if the frame moves away from the page, it
immediately just returns that address, but it knows that it cannot be a
"valid" unwind address within that page.

That separate "unwind within one page" can well be using the dwarf info:
it only really needs to verify that
- it stays within the same page
- the unwind info results in an aligned pointer at a strictly higher
those two tests are trivial, and _guarantee_ that we don't access any
half-way untrustworthy pointer.

See? No state machine. And notice how the dwarf info absolutely does _not_
need to know about the magic page-crossing events like interrupts,
exceptions or anything else. Very much on purpose.

This is what we used to do with %ebp following (at least on x86), and what
I tried to explain. Nothing complicated. And it's easy to set up, and the
dawrf unwinder (which depends on complex data structures) can be trivially
verified (ie it just stops immediately when it doesn't understand
something, and "crosses a page" is such an event).

And the page-crossing events don't need to know _anything_ about the dwarf
format (or %ebp, or any other unwinding details), and it can just depend
on the chain of pages (which is trivial to set up - if the exception pages
aren't already using the same format as the interrupt stack pages, it
should at least not be hard to make them do that).

And the page-level unwinding data format is trivial. I don't think we even
bothered verifying it on x86, but I guess some simple sanity-checking even
there might be worth it (but unlike any dwarf unwinding, this is _not_ at
all complicated, and there are absolutely _zero_ issues about compiler and
linker versions etc etc).

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at