Re: [PATCH v3 00/13] Virtually mapped stacks with guard pages (x86, core)

From: Linus Torvalds
Date: Thu Jun 23 2016 - 02:02:27 EST


On Wed, Jun 22, 2016 at 6:22 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
> I implemented a percpu cache, and it's useless.
>
> When a task goes away, one reference is held until the next RCU grace
> period so that task_struct can be used under RCU (look for
> delayed_put_task_struct).

Yeah, that RCU batching will screw the cache idea.

But isn't it only the "task_struct" that needs that? That's a separate
allocation from the stack, which contains the "thread_info".

I think that what we *could* do is re-use the tread-info within the
RCU grace period, as long as we delay freeing the task_struct.

Yes, yes, we currently tie the task_struct and thread_info lifetimes
together very tightly, but that's a historical thing rather than a
requirement. We do the

account_kernel_stack(tsk->stack, -1);
arch_release_thread_info(tsk->stack);
free_thread_info(tsk->stack);

in free_task(), but I could imagine doing it earlier, and
independently of the RCU-delayed free.

In fact, I think we just do that at exit() time synchronously. The
reference counting of the task_struct() is because a lot of other
threads can have references to the exiting thread (and we have the
tasklist and thread lists that are RCU-traversed), but none of those
other references should ever look at the stack. Or even the
thread-info.

Hmm. I bet it would show some problems, but not be technically
impossible. Especially if we make the thread-info rules be like the
SLAB_DESTROY_BY_RCU semantics - the allocation may be re-used during
the RCU grace period, but it is going to still exists and be of the
same type.

This sounds very much like something for Oleg Nesterov.

Oleg, what do you think? Would it be reasonable to free the stack and
thread_info synchronously at exit time, clear the pointer (to catch
any odd use), and only RCU-delay the task_struct itself?

That is, after all, what we already do with the VM, semaphores, files,
fs info etc. There's no real reason I see to keep the stack around.

(Obviously, we can't release it in do_exit() itself like we do some of
the other state - it would need to be released after we've scheduled
away to another process' stack, but we already have that TASK_DEAD
handling in finish_task_switch for this exact reason).

Linus