Re: [PATCH 1/3] sched: add sched_task_call()

From: Josh Poimboeuf
Date: Tue Feb 17 2015 - 09:13:42 EST


On Tue, Feb 17, 2015 at 10:24:50AM +0100, Peter Zijlstra wrote:
> On Mon, Feb 16, 2015 at 04:05:05PM -0600, Josh Poimboeuf wrote:
> > Yeah, I can understand that. I definitely want to avoid touching the
> > scheduler code. Basically I'm trying to find a way to atomically do the
> > following:
> >
> > if (task is sleeping) {
> > walk the stack
> > if (certain set of functions isn't on the stack)
> > set (or clear) a thread flag for the task
> > }
> >
> > Any ideas on how I can achieve that? So far my ideas are:
>
> So far stack unwinding has basically been a best effort debug output
> kind of thing, you're wanting to make the integrity of the kernel depend
> on it.
>
> You require an absolute 100% correctness of the stack unwinder -- where
> today it is; as stated above; a best effort debug output thing.
>
> That is a _big_ change.

True, this does seem to be the first attempt to rely on the correctness
of the stack walking code.

> Has this been properly considered; has all the asm of the relevant
> architectures been audited? Are you planning on maintaining that level
> of audit for all new patches?

I agree, the unwinder needs to be 100% correct. Currently we only
support x86_64. Luckily, in general, stacks are very well defined and
walking the stack of a sleeping task should be straightforward. I don't
think it would be too hard to ensure the stack unwinder is right for
other architectures as we port them.

> Because the way you propose to do things, we'll end up with silent but
> deadly fail if the unwinder is less than 100% correct. No way to easily
> debug that, no warns, just silent corruption.

That's a good point. There's definitely room for additional error
checking in the x86 stack walking code. A couple of ideas:

- make sure it starts from a __schedule() call at the top
- make sure we actually reach the bottom of the stack
- make sure each stack frame's return address is immediately after a
call instruction

If we hit any of those errors, we can bail out, unregister with ftrace
and restore the system to its original state.

> Are you really really sure you want to go do this?

Basically, yes. We've had a lot of conversations about many different
variations of how to do this over the past year, and this is by far the
best approach we've come up with.

FWIW, we've been doing something similar with kpatch and stop_machine()
over the last 1+ years, and have run into zero problems with that
approach.

--
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/