Re: [PATCH 1/3] sched: add sched_task_call()

From: Peter Zijlstra
Date: Thu Feb 19 2015 - 05:16:21 EST


On Wed, Feb 18, 2015 at 10:17:53PM -0600, Josh Poimboeuf wrote:
> On Thu, Feb 19, 2015 at 01:20:58AM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 18, 2015 at 11:12:56AM -0600, Josh Poimboeuf wrote:

> > > The next line of attack is patching tasks when exiting the kernel to
> > > user space (system calls, interrupts, signals), to catch all CPU-bound
> > > and some I/O-bound tasks. That's done in patch 9 [1] of the consistency
> > > model patch set.
> >
> > So the HPC people are really into userspace that does for (;;) ; and
> > isolate that on CPUs and have the tick interrupt stopped and all that.
> >
> > You'll not catch those threads on the sysexit path.
> >
> > And I'm fairly sure they'll not want to SIGSTOP/CONT their stuff either.
> >
> > Now its fairly easy to also handle this; just mark those tasks with a
> > _TIF_WORK_SYSCALL_ENTRY flag, have that slowpath wait for the flag to
> > go-away, then flip their state and clear the flag.
>
> I guess you mean patch the task when it makes a syscall? I'm doing that
> already on syscall exit with a bit in _TIF_ALLWORK_MASK and
> _TIF_DO_NOTIFY_MASK.

No, these tasks will _never_ make syscalls. So you need to guarantee
they don't accidentally enter the kernel while you flip them. Something
like so should do.

You set TIF_ENTER_WAIT on them, check they're still in userspace, flip
them then clear TIF_ENTER_WAIT.

---
arch/x86/include/asm/thread_info.h | 4 +++-
arch/x86/kernel/entry_64.S | 2 ++
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index e82e95abc92b..baa836f13536 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -90,6 +90,7 @@ struct thread_info {
#define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */
#define TIF_ADDR32 29 /* 32-bit address space on 64 bits */
#define TIF_X32 30 /* 32-bit native x86-64 binary */
+#define TIF_ENTER_WAIT 31

#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
@@ -113,12 +114,13 @@ struct thread_info {
#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
#define _TIF_ADDR32 (1 << TIF_ADDR32)
#define _TIF_X32 (1 << TIF_X32)
+#define _TIF_ENTER_WAIT (1 << TIF_ENTER_WAIT)

/* work to do in syscall_trace_enter() */
#define _TIF_WORK_SYSCALL_ENTRY \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \
_TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT | \
- _TIF_NOHZ)
+ _TIF_NOHZ | _TIF_ENTER_WAIT)

/* work to do in syscall_trace_leave() */
#define _TIF_WORK_SYSCALL_EXIT \
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index db13655c3a2a..735566b35903 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -387,6 +387,8 @@ GLOBAL(system_call_after_swapgs)

/* Do syscall tracing */
tracesys:
+ andl $_TIF_ENTER_WAIT,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ jnz tracesys;
leaq -REST_SKIP(%rsp), %rdi
movq $AUDIT_ARCH_X86_64, %rsi
call syscall_trace_enter_phase1

> > > As a last resort, if there are still any tasks which are sleeping on a
> > > to-be-patched function, the user can send them SIGSTOP and SIGCONT to
> > > force them to be patched.
> >
> > You typically cannot SIGSTOP/SIGCONT kernel threads. Also
> > TASK_UNINTERRUPTIBLE sleeps are unaffected by signals.
> >
> > Bit pesky that.. needs pondering.

I still absolutely hate you need to disturb userspace like that. Signals
are quite visible and perturb userspace state.

Also, you cannot SIGCONT a task that was SIGSTOP'ed by userspace for
what they thought was a good reason. You'd wreck their state.

> But now I'm thinking that kthreads will almost never be a problem. Most
> kthreads are basically this:

You guys are way too optimistic; maybe its because I've worked on
realtime stuff too much, but I'm always looking at worst cases. If you
can't handle those, I feel you might as well not bother :-)

> Patching thread_fn wouldn't be possible unless we killed the thread.

It is, see kthread_park().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/