Re: [PATCH v3 4/5] prctl: Hook L1D flushing in via prctl

From: Thomas Gleixner
Date: Fri Dec 04 2020 - 17:20:17 EST


Balbir,

On Fri, Nov 27 2020 at 17:59, Balbir Singh wrote:
> +enum l1d_flush_out_mitigations {
> + L1D_FLUSH_OUT_OFF,
> + L1D_FLUSH_OUT_ON,
> +};
> +
> +static enum l1d_flush_out_mitigations l1d_flush_out_mitigation __ro_after_init = L1D_FLUSH_OUT_ON;

Why default on and why stays it on when the CPU is not affected by L1TF ...

> /* Default mitigation for TAA-affected CPUs */
> static enum taa_mitigations taa_mitigation __ro_after_init = TAA_MITIGATION_VERW;
> static bool taa_nosmt __ro_after_init;
> @@ -379,6 +386,18 @@ static void __init taa_select_mitigation(void)
> pr_info("%s\n", taa_strings[taa_mitigation]);
> }
>
> +static int __init l1d_flush_out_parse_cmdline(char *str)
> +{
> + if (!boot_cpu_has_bug(X86_BUG_L1TF))
> + return 0;

... while here you check for L1TF.

Also shouldn't it be default off and enabled via command line?

> +static int l1d_flush_out_prctl_get(struct task_struct *task)
> +{
> + int ret;
> +
> + if (l1d_flush_out_mitigation == L1D_FLUSH_OUT_OFF)
> + return PR_SPEC_FORCE_DISABLE;
> +
> + ret = test_ti_thread_flag(&task->thread_info, TIF_SPEC_L1D_FLUSH);

That ret indirection is pointless. Just make it if (test_....)

> +static int l1d_flush_out_prctl_set(struct task_struct *task, unsigned long ctrl)
> +{
> +
> + if (l1d_flush_out_mitigation == L1D_FLUSH_OUT_OFF)
> + return -EPERM;

So here you check for off and then...

> int enable_l1d_flush_for_task(struct task_struct *tsk)
> {
> + /*
> + * Do not enable L1D_FLUSH_OUT if
> + * b. The CPU is not affected by the L1TF bug
> + * c. The CPU does not have L1D FLUSH feature support
> + */
> +
> + if (!boot_cpu_has_bug(X86_BUG_L1TF) ||
> + !boot_cpu_has(X86_FEATURE_FLUSH_L1D))
> + return -EINVAL;

... you check for the feature bits with a malformatted condition at some
other place. It's not supported when these conditions are not there. So
why having this check here?

> +
> set_ti_thread_flag(&tsk->thread_info, TIF_SPEC_L1D_FLUSH);
> return 0;
> }

Aside of that, why is this in tlb.c and not in bugs.c? There is nothing
tlb specific in these enable/disable functions. They just fiddle with
the TIF bit.

> +/*
> + * Sent to a task that opts into L1D flushing via the prctl interface
> + * but ends up running on an SMT enabled core.
> + */
> +static void l1d_flush_kill(struct callback_head *ch)
> +{
> + force_sig(SIGBUS);
> +}
> +
> static inline unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
> {
> unsigned long next_tif = task_thread_info(next)->flags;
> unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_SPEC_MASK;
> + unsigned long next_mm;
>
> BUILD_BUG_ON(TIF_SPEC_L1D_FLUSH != TIF_SPEC_IB + 1);
> - return (unsigned long)next->mm | spec_bits;
> + next_mm = (unsigned long)next->mm | spec_bits;
> +
> + if ((next_mm & LAST_USER_MM_L1D_FLUSH) && this_cpu_read(cpu_info.smt_active)) {

Wheeee. Yet more unconditional checks on every context switch.

> + clear_ti_thread_flag(&next->thread_info, TIF_SPEC_L1D_FLUSH);
> + next->l1d_flush_kill.func = l1d_flush_kill;
> + task_work_add(next, &next->l1d_flush_kill, true);

int task_work_add(struct task_struct *task, struct callback_head *twork,
enum task_work_notify_mode mode);

true is truly not a valid enum constant ....

> + }

So you really want to have:

DEFINE_STATIC_KEY_FALSE(l1dflush_enabled);
static bool l1dflush_mitigation __init_data;

and then with the command line option you set l1dflush_mitigation and in
check_bugs() you invoke l1dflush_select_mitigation() which does:

if (!l1dflush_mitigation || !boot_cpu_has_bug(X86_BUG_L1TF) ||
!boot_cpu_has(X86_FEATURE_FLUSH_L1D))
return;

static_branch_enable(&l1dflush_enabled);

and then in l1d_flush_out_prctl_set()

if (!static_branch_unlikely(&l1dflush_enabled))
return -ENOTSUPP;

Then make the whole switch machinery do:

if (static_branch_unlikely(&l1dflush_enabled)) {
if (unlikely((next_mm | prev_mm) & LAST_USER_MM_L1D_FLUSH))
l1dflush_evaluate(next_mm, prev_mm);
}

and l1dflush_evaluate()

if (prev_mm & LAST_USER_MM_L1D_FLUSH)
l1d_flush();

if ((next_mm & LAST_USER_MM_L1D_FLUSH) &&
this_cpu_read(cpu_info.smt_active)) {

clear_ti_thread_flag(&next->thread_info, TIF_SPEC_L1D_FLUSH);
next->l1d_flush_kill.func = l1d_flush_kill;
task_work_add(next, &next->l1d_flush_kill, TWA_RESUME);
}

That way the overhead is on systems where the admin decides to enable it
and if enabled the evaluation of prev_mm and next_mm is pushed out of
line.

Hmm?

Thanks,

tglx