Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation

From: Ingo Molnar
Date: Tue Apr 30 2019 - 01:06:27 EST

Next message: weiyongjun (A): "RE: [PATCH] tun: Fix use-after-free in tun_net_xmit"
Previous message: Chanwoo Choi: "Re: [PATCH v3 1/4] include: dt-bindings: add Performance Monitoring Unit for Exynos"
In reply to: Andy Lutomirski: "Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation"
Next in thread: Peter Zijlstra: "Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Andy Lutomirski <luto@xxxxxxxxxx> wrote:

> On Sat, Apr 27, 2019 at 3:46 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:

> > So I'm wondering whether there's a 4th choice as well, which avoids
> > control flow corruption *before* it happens:
> >
> > - A C language runtime that is a subset of current C syntax and
> > semantics used in the kernel, and which doesn't allow access outside
> > of existing objects and thus creates a strictly enforced separation
> > between memory used for data, and memory used for code and control
> > flow.
> >
> > - This would involve, at minimum:
> >
> > - tracking every type and object and its inherent length and valid
> > access patterns, and never losing track of its type.
> >
> > - being a lot more organized about initialization, i.e. no
> > uninitialized variables/fields.
> >
> > - being a lot more strict about type conversions and pointers in
> > general.
>
> You're not the only one to suggest this. There are at least a few
> things that make this extremely difficult if not impossible. For
> example, consider this code:
>
> void maybe_buggy(void)
> {
> int a, b;
> int *p = &a;
> int *q = (int *)some_function((unsigned long)p);
> *q = 1;
> }
>
> If some_function(&a) returns &a, then all is well. But if
> some_function(&a) returns &b or even a valid address of some unrelated
> kernel object, then the code might be entirely valid and correct C,
> but I don't see how the runtime checks are supposed to tell whether
> the resulting address is valid or is a bug. This type of code is, I
> think, quite common in the kernel -- it happens in every data
> structure where we have unions of pointers and integers or where we
> steal some known-zero bits of a pointer to store something else.

So the thing is, for the infinitely large state space of "valid C code"
we already disallow an infinitely many versions in the Linux kernel.

We have complicated rules that disallow certain C syntactical and
semantical constructs, both on the tooling (build failure/warning) and on
the review (style/taste) level.

So the question IMHO isn't whether it's "valid C", because we already
have the Linux kernel's own C syntax variant and are enforcing it with
varying degrees of success.

The question is whether the example you gave can be written in a strongly
typed fashion, whether it makes sense to do so, and what the costs are.

I think it's evident that it can be written with strongly typed
constructs, by separating pointers from embedded error codes - with
negative side effects to code generation: for example it increases
structure sizes and error return paths.

I think there's four main costs of converting such a pattern to strongly
typed constructs:

- memory/cache footprint: there's a nonzero cost there.
- performance: this will hurt too.
- code readability: this will probably improve.
- code robustness: this will improve too.

So I think the proper question to ask is not whether there's common C
syntax within the kernel that would have to be rewritten, but whether the
total sum of memory and runtime overhead of strongly typed C programming
(if it's possible/desirable) is larger than the total sum of a typical
Linux distro enabling the various current and proposed kernel hardening
features that have a runtime overhead:

- the SMAP/SMEP overhead of STAC/CLAC for every single user copy

- other usercopy hardening features

- stackprotector

- KASLR

- compiler plugins against information leaks

- proposed KASLR extension to implement module randomization and -PIE overhead

- proposed function call integrity checks

- proposed per system call kernel stack offset randomization

- ( and I'm sure I forgot about a few more, and it's all still only
reactive security, not proactive security. )

That's death by a thousand cuts and CR3 switching during system calls is
also throwing a hand grenade into the fight ;-)

So if people are also proposing to do CR3 switches in every system call,
I'm pretty sure the answer is "yes, even a managed C runtime is probably
faster than *THAT* sum of a performanc mess" - at least with the current
CR3 switching x86-uarch cost structure...

Thanks,

Ingo

Next message: weiyongjun (A): "RE: [PATCH] tun: Fix use-after-free in tun_net_xmit"
Previous message: Chanwoo Choi: "Re: [PATCH v3 1/4] include: dt-bindings: add Performance Monitoring Unit for Exynos"
In reply to: Andy Lutomirski: "Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation"
Next in thread: Peter Zijlstra: "Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]