Re: [PATCH v10 03/20] x86/stackvalidate: Compile-time stack validation

From: Josh Poimboeuf
Date: Thu Aug 20 2015 - 00:00:59 EST


On Wed, Aug 19, 2015 at 12:01:38PM +0200, Ingo Molnar wrote:
>
> * Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> > On Sat, Aug 15, 2015 at 12:23:54AM -0700, Andrew Morton wrote:
> > > On Thu, 13 Aug 2015 22:10:24 -0500 Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > >
> > > > This adds a CONFIG_STACK_VALIDATION option which enables a host tool
> > > > named stackvalidate which runs at compile time.
> > >
> > > It would be useful to
> > >
> > > - show example output for a typical error site
> > >
> > > - describe the consequences of that error (ie: why should we fix
> > > these things?)
> > >
> > > - describe what needs to be done to repair these sites.
> > >
> > >
> > > IOW, what do we gain from merging all this stuff?
> >
> > I attempted to do all that in Documentation/stack-validation.txt which
> > is in patch 03/20. Does it address your concerns? Here it is:
>
> I think it answers the first and third question, but not the second one:
>
> - describe the consequences of that error (ie: why should we fix
> these things?)
>
> would be nice to document all that richly.

Ok. I've tried to answer the "why" question from both a broad
perspective ("why do we need stackvalidate?") as well as for each of the
rules that it enforces.

---8<---

Subject: [PATCH] stackvalidate: Document why it's needed

Signed-off-by: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
---
Documentation/stack-validation.txt | 122 +++++++++++++++++++++++++++++++++++--
1 file changed, 118 insertions(+), 4 deletions(-)

diff --git a/Documentation/stack-validation.txt b/Documentation/stack-validation.txt
index c3d3d35..94dad40 100644
--- a/Documentation/stack-validation.txt
+++ b/Documentation/stack-validation.txt
@@ -24,6 +24,101 @@ instructions). Similarly, it knows how to follow switch statements, for
which gcc sometimes uses jump tables.


+Why do we need stack validation?
+--------------------------------
+
+Here are some of the benefits of validating stack metadata:
+
+a) More reliable stack traces for frame pointer enabled kernels
+
+ Frame pointers are used for debugging purposes. They allow runtime
+ code and debug tools to be able to walk the stack to determine the
+ chain of function call sites that led to the currently executing
+ code.
+
+ For some architectures, frame pointers are enabled by
+ CONFIG_FRAME_POINTER. For some other architectures they may be
+ required by the ABI (sometimes referred to as "backchain pointers").
+
+ For C code, gcc automatically generates instructions for setting up
+ frame pointers when the -fno-omit-frame-pointer option is used.
+
+ But for asm code, the frame setup instructions have to be written by
+ hand, which most people don't do. So the end result is that
+ CONFIG_FRAME_POINTER is honored for C code but not for most asm code.
+
+ For stack traces based on frame pointers to be reliable, all
+ functions which call other functions must first create a stack frame
+ and update the frame pointer. If a first function doesn't properly
+ create a stack frame before calling a second function, the *caller*
+ of the first function will be skipped on the stack trace.
+
+ The benefit of stackvalidate here is that it ensures that *all*
+ functions honor CONFIG_FRAME_POINTER. As a result, no functions will
+ ever [*] be skipped on a stack trace.
+
+ [*] unless an interrupt or exception has occurred at the very
+ beginning of a function before the stack frame has been created,
+ or at the very end of the function after the stack frame has been
+ destroyed. This is an inherent limitation of frame pointers.
+
+b) 100% reliable stack traces for DWARF enabled kernels
+
+ (NOTE: This is not yet implemented)
+
+ As an alternative to frame pointers, DWARF Call Frame Information
+ (CFI) metadata can be used to walk the stack. Unlike frame pointers,
+ CFI metadata is out of band. So it doesn't affect runtime
+ performance and it can be reliable even when interrupts or exceptions
+ are involved.
+
+ For C code, gcc automatically generates DWARF CFI metadata. But for
+ asm code, generating CFI is a tedious manual approach which requires
+ manually placed .cfi assembler macros to be scattered throughout the
+ code. It's clumsy and very easy to get wrong, and it makes the real
+ code harder to read.
+
+ Stackvalidate will improve this situation in several ways. For code
+ which already has CFI annotations, it will validate them. For code
+ which doesn't have CFI annotations, it will generate them. So an
+ architecture can opt to strip out all the manual .cfi annotations
+ from their asm code and have stackvalidate generate them instead.
+
+ We might also add a runtime stack validation debug option where we
+ periodically walk the stack from schedule() and/or an NMI to ensure
+ that the stack metadata is sane and that we reach the bottom of the
+ stack.
+
+ So the benefit of stackvalidate here will be that external tooling
+ should always show perfect stack traces. And the same will be true
+ for kernel warning/oops traces if the architecture has a runtime
+ DWARF unwinder.
+
+c) Higher live patching compatibility rate
+
+ (NOTE: This is not yet implemented)
+
+ Currently with CONFIG_LIVEPATCH there's a basic live patching
+ framework which is safe for roughly 85-90% of "security" fixes. But
+ patches can't have complex features like function dependency or
+ prototype changes, or data structure changes.
+
+ There's a strong need to support patches which have the more complex
+ features so that the patch compatibility rate for security fixes can
+ eventually approach something resembling 100%. To achieve that, a
+ "consistency model" is needed, which allows tasks to be safely
+ transitioned from an unpatched state to a patched state.
+
+ One of the key requirements of the currently proposed livepatch
+ consistency model [*] is that it needs to walk the stack of each
+ sleeping task to determine if it can be transitioned to the patched
+ state. If stackvalidate can ensure that stack traces are reliable,
+ this consistency model can be used and the live patching
+ compatibility rate can be improved significantly.
+
+ [*] https://lkml.kernel.org/r/cover.1423499826.git.jpoimboe@xxxxxxxxxx
+
+
Rules
-----

@@ -35,15 +130,26 @@ To achieve the validation, stackvalidate enforces the following rules:
outside of a function, it flags an error since that usually indicates
callable code which should be annotated accordingly.

+ This rule is needed so that stackvalidate can properly identify each
+ callable function in order to analyze its stack metadata.
+
2. Conversely, each section of code which is *not* callable should *not*
be annotated as an ELF function. The ENDPROC macro shouldn't be used
in this case.

+ This rule is needed so that stackvalidate can ignore non-callable
+ code. Such code doesn't have to follow any of the other rules.
+
3. Each callable function which calls another function must have the
correct frame pointer logic, if required by CONFIG_FRAME_POINTER or
the architecture's back chain rules. This can by done in asm code
with the FRAME_BEGIN/FRAME_END macros.

+ This rule ensures that frame pointer based stack traces will work as
+ designed. If function A doesn't create a stack frame before calling
+ function B, the _caller_ of function A will be skipped on the stack
+ trace.
+
4. Dynamic jumps and jumps to undefined symbols are only allowed if:

a) the jump is part of a switch statement; or
@@ -51,10 +157,18 @@ To achieve the validation, stackvalidate enforces the following rules:
b) the jump matches sibling call semantics and the frame pointer has
the same value it had on function entry.

+ This rule is needed so that stackvalidate can reliably analyze all of
+ a function's code paths. If a function jumps to code in another
+ file, and it's not a sibling call, stackvalidate has no way to follow
+ the jump because it only analyzes a single file at a time.
+
5. A callable function may not execute kernel entry/exit instructions.
The only code which needs such instructions is kernel entry code,
which shouldn't be be in callable functions anyway.

+ This rule is just a sanity check to ensure that callable functions
+ return normally.
+

Errors in .S files
------------------
@@ -63,8 +177,8 @@ If you're getting an error in a compiled .S file which you don't
understand, first make sure that the affected code follows the above
rules.

-Here are some examples of common problems and suggestions for how to fix
-them.
+Here are some examples of common warnings reported by stackvalidate,
+what they mean, and suggestions for how to fix them.


1. stackvalidate: asm_file.o: func()+0x128: call without frame pointer save/setup
@@ -73,8 +187,8 @@ them.
updating the frame pointer.

If func() is indeed a callable function, add proper frame pointer
- logic using the FP_SAVE and FP_RESTORE macros. Otherwise, remove its
- ELF function annotation by changing ENDPROC to END.
+ logic using the FRAME_BEGIN and FRAME_END macros. Otherwise, remove
+ its ELF function annotation by changing ENDPROC to END.

If you're getting this error in a .c file, see the "Errors in .c
files" section.
--
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/