Re: [PATCH v7 07/11] arch/x86: enable task isolation functionality

From: Andy Lutomirski
Date: Mon Sep 28 2015 - 16:59:50 EST


On Mon, Sep 28, 2015 at 11:17 AM, Chris Metcalf <cmetcalf@xxxxxxxxxx> wrote:
> In prepare_exit_to_usermode(), we would like to call
> task_isolation_enter() on every return to userspace, and like
> other work items, we would like to recheck for more work after
> calling it, since it will enable interrupts internally.
>
> However, if task_isolation_enter() is the only work item,
> and it has already been called once, we don't want to continue
> calling it in a loop. We don't have a dedicated TIF flag for
> task isolation, and it wouldn't make sense to have one, since
> we'd want to set it before starting exit every time, and then
> clear it the first time around the loop.
>
> Instead, we change the loop structure somewhat, so that we
> have a more inclusive set of flags that are tested for on the
> first entry to the function (including TIF_NOHZ), and if any
> of those flags are set, we enter the loop. And, we do the
> task_isolation() test unconditionally at the bottom of the loop,
> but then when making the decision to loop back, we just use the
> set of flags that doesn't include TIF_NOHZ. That way we only
> loop if there is other work to do, but then if that work
> is done, we again unconditionally call task_isolation_enter().
>
> In syscall_trace_enter_phase1(), we try to add the necessary
> support for strict-mode detection of syscalls in an optimized
> way, by letting the code remain unchanged if we are not using
> TASK_ISOLATION, but otherwise calling enter_from_user_mode()
> under the first time we see _TIF_NOHZ, and then waiting until
> after we do the secure computing work to actually clear the bit
> from the "work" variable and call task_isolation_syscall().
>
> Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx>
> ---
> arch/x86/entry/common.c | 47 ++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 36 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 80dcc9261ca3..0f74389c6f3b 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -21,6 +21,7 @@
> #include <linux/context_tracking.h>
> #include <linux/user-return-notifier.h>
> #include <linux/uprobes.h>
> +#include <linux/isolation.h>
>
> #include <asm/desc.h>
> #include <asm/traps.h>
> @@ -81,7 +82,8 @@ unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
> */
> if (work & _TIF_NOHZ) {
> enter_from_user_mode();
> - work &= ~_TIF_NOHZ;
> + if (!IS_ENABLED(CONFIG_TASK_ISOLATION))
> + work &= ~_TIF_NOHZ;
> }
> #endif
>
> @@ -131,6 +133,13 @@ unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
> }
> #endif
>
> + /* Now check task isolation, if needed. */
> + if (IS_ENABLED(CONFIG_TASK_ISOLATION) && (work & _TIF_NOHZ)) {
> + work &= ~_TIF_NOHZ;
> + if (task_isolation_strict())
> + task_isolation_syscall(regs->orig_ax);
> + }
> +

This is IMO rather nasty. Can you try to find a way to do this
without making the control flow depend on config options?

What guarantees that TIF_NOHZ is an acceptable thing to check?

> /* Do our best to finish without phase 2. */
> if (work == 0)
> return ret; /* seccomp and/or nohz only (ret == 0 here) */
> @@ -217,10 +226,26 @@ static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
> /* Called with IRQs disabled. */
> __visible void prepare_exit_to_usermode(struct pt_regs *regs)
> {
> + u32 cached_flags;
> +
> if (WARN_ON(!irqs_disabled()))
> local_irq_disable();
>
> /*
> + * We may want to enter the loop here unconditionally to make
> + * sure to do some work at least once. Test here for all
> + * possible conditions that might make us enter the loop,
> + * and return immediately if none of them are set.
> + */
> + cached_flags = READ_ONCE(pt_regs_to_thread_info(regs)->flags);
> + if (!(cached_flags & (TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
> + _TIF_UPROBE | _TIF_NEED_RESCHED |
> + _TIF_USER_RETURN_NOTIFY | _TIF_NOHZ))) {
> + user_enter();
> + return;
> + }
> +

Too complicated and too error prone.

In any event, I don't think that the property you actually want is for
the loop to be entered once. I think the property you want is that
we're isolated by the time we're finished. Why not just check that
directly in the loop condition?

> + /*
> * In order to return to user mode, we need to have IRQs off with
> * none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY,
> * _TIF_UPROBE, or _TIF_NEED_RESCHED set. Several of these flags
> @@ -228,15 +253,7 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
> * so we need to loop. Disabling preemption wouldn't help: doing the
> * work to clear some of the flags can sleep.
> */
> - while (true) {
> - u32 cached_flags =
> - READ_ONCE(pt_regs_to_thread_info(regs)->flags);
> -
> - if (!(cached_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
> - _TIF_UPROBE | _TIF_NEED_RESCHED |
> - _TIF_USER_RETURN_NOTIFY)))
> - break;
> -
> + do {
> /* We have work to do. */
> local_irq_enable();
>
> @@ -258,9 +275,17 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
> if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> fire_user_return_notifiers();
>
> + if (task_isolation_enabled())
> + task_isolation_enter();
> +

Does anything here guarantee forward progress or at least give
reasonable confidence that we'll make forward progress?

> /* Disable IRQs and retry */
> local_irq_disable();
> - }
> +
> + cached_flags = READ_ONCE(pt_regs_to_thread_info(regs)->flags);
> +
> + } while (!(cached_flags & (TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
> + _TIF_UPROBE | _TIF_NEED_RESCHED |
> + _TIF_USER_RETURN_NOTIFY)));
>
> user_enter();
> }
> --
> 2.1.2
>

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/