Re: BUG_ON() in workingset_node_shadows_dec() triggers

From: Kees Cook
Date: Wed Oct 05 2016 - 18:17:19 EST


On Wed, Oct 5, 2016 at 2:46 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Oct 5, 2016 at 2:14 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>> Now, it can be argued that killing the process part should be
>> configurable and that the code should be written to handle a WARN and
>> clean up and error out nicely. But I still want to retain the "kill
>> the process immediately" behavior in some capacity.
>
> If "some capacity" is "can't do user space accesses", we could easily
> force a SIGKILL of the current process. It won't die immediately in
> the kernel, but it won't be returning to user space either.

With my more paranoid desires, I would prefer to keep "stop kernel
execution with the state set up by this process", not just "make the
process never return to user-space". I would need to meditate on
whether what I really want is just "panic on Oops" or not, though.
Right now, for example, I don't use panic-on-oops when running lkdtm
tests since each test gets (correctly) killed and the Oops can be
examined for the expected failure mode, all without bringing down the
entire system.

> The problem with the immediate kill is that it can be in interrupt
> context, or just holding arbitrary locks. And it's hard to even tell
> dynamically (sometimes you can see it: with preemption enabled you can
> tell "am I in a non-preempt area", for example, but it ends up
> depending on config options).

Yeah, I've seen some hilarious failure modes while building lkdtm
tests for various kernel self-protections.

> And *if* we make BUG() actually do something sane (non-trapping), we
> can easily make it be generic, not arch-specific. In fact, I'd
> implement it by just adding a "handle_bug()" in kernel/panic.c...

Yeah, I'm not sure what the right next step would be. Do we need a new
set of functions between WARN and BUG? Or maybe extract the
process-killing logic on a per-arch level and make it a specific API
so that it can be explicitly called as part of error-handling? Hmm

-Kees

--
Kees Cook
Nexus Security