Re: [PATCH 5/6] nohz: support PR_DATAPLANE_STRICT mode

From: Andy Lutomirski
Date: Sat May 09 2015 - 03:29:14 EST


On May 8, 2015 11:44 PM, "Chris Metcalf" <cmetcalf@xxxxxxxxxx> wrote:
>
> With QUIESCE mode, the task is in principle guaranteed not to be
> interrupted by the kernel, but only if it behaves. In particular,
> if it enters the kernel via system call, page fault, or any of
> a number of other synchronous traps, it may be unexpectedly
> exposed to long latencies. Add a simple flag that puts the process
> into a state where any such kernel entry is fatal.
>
> To allow the state to be entered and exited, we add an internal
> bit to current->dataplane_flags that is set when prctl() sets the
> flags. That way, when we are exiting the kernel after calling
> prctl() to forbid future kernel exits, we don't get immediately
> killed.

Is there any reason this can't already be addressed in userspace using
/proc/interrupts or perf_events? ISTM the real goal here is to detect
when we screw up and fail to avoid an interrupt, and killing the task
seems like overkill to me.

Also, can we please stop further torturing the exit paths? We have a
disaster of assembly code that calls into syscall_trace_leave and
do_notify_resume. Those functions, in turn, *both* call user_enter
(WTF?), and on very brief inspection user_enter makes it into the nohz
code through multiple levels of indirection, which, with these
patches, has yet another conditionally enabled helper, which does this
new stuff. It's getting to be impossible to tell what happens when we
exit to user space any more.

Also, I think your code is buggy. There's no particular guarantee
that user_enter is only called once between sys_prctl and the final
exit to user mode (see the above WTF), so you might spuriously kill
the process.

Also, I think that most users will be quite surprised if "strict
dataplane" code causes any machine check on the system to kill your
dataplane task. Similarly, a user accidentally running perf record -a
probably should have some reasonable semantics. /proc/interrupts gets
that right as is. Sure, MCEs will hurt your RT performance, but Intel
screwed up the way that MCEs work, so we should make do.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/