Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry

From: Ingo Molnar
Date: Thu May 07 2015 - 06:48:56 EST



* Rik van Riel <riel@xxxxxxxxxx> wrote:

> > If, on the other hand, you're just going to remotely sample the
> > in-memory context, that sounds good.
>
> It's the latter.
>
> If you look at /proc/<pid>/{stack,syscall,wchan} and other files,
> you will see we already have ways to determine, from in memory
> content, where a program is running at a certain point in time.
>
> In fact, the timer interrupt based accounting does a similar thing.
> It has a task examine its own in-memory state to figure out what it
> was doing before the timer interrupt happened.
>
> The kernel side stack pointer is probably enough to tell us whether
> a task is active in kernel space, on an irq stack, or (maybe) in
> user space. Not convinced about the latter, we may need to look at
> the same state the RCU code keeps track of to see what mode a task
> is in...
>
> I am looking at the code to see what locks we need to grab.
>
> I suspect the runqueue lock may be enough, to ensure that the task
> struct, and stack do not go away while we are looking at them.

That will be enough, especially if you get to the task reference via
rq->curr.

> We cannot take the lock_trace(task) from irq context, and we
> probably do not need to anyway, since we do not care about a precise
> stack trace for the task.

So one worry with this and similar approaches of statistically
detecting user mode would be the fact that on the way out to
user-space we don't really destroy the previous call trace - we just
pop off the stack (non-destructively), restore RIPs and are gone.

We'll need that percpu flag I suspect.

And once we have the flag, we can get rid of the per syscall RCU
callback as well, relatively easily: with CMPXCHG (in
synchronize_rcu()!) we can reliably sample whether a CPU is in user
mode right now, while the syscall entry/exit path does not use any
atomics, we can just use a simple MOV.

Once we observe 'user mode', then we have observed quiescent state and
synchronize_rcu() can continue. If we've observed kernel mode we can
frob the remote task's TIF_ flags to make it go into a quiescent state
publishing routine on syscall-return.

The only hard requirement of this scheme from the RCU synchronization
POV is that all kernel contexts that may touch RCU state need to flip
this flag reliably to 'kernel mode': i.e. all irq handlers, traps,
NMIs and all syscall variants need to do this.

But once it's there, it's really neat.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/