Re: [PATCH] x86: enable Data Operand Independent Timing Mode

From: Ard Biesheuvel
Date: Thu Jan 26 2023 - 05:21:30 EST


On Wed, 25 Jan 2023 at 17:46, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 1/25/23 08:22, Ard Biesheuvel wrote:
> ...
> > All the nospec stuff we added for Spectre v1 serves the same purpose,
> > essentially, although the timing variances due to cache misses are
> > likely easier to measure. IOW, some of the kernel is now written that
> > way in fact, although the author of that doc may have had something
> > else in mind.
> >
> > So IMHO, the scope is really not as narrow as you think.
>
> I've spoken with the folks who wrote that doc. They've told me
> repeatedly that the scope is super narrow. Seriously, look at just
> *one* thing in the other Intel doc about mitigating timing side-channels[1]:
>
> be wary of code generated from high-level language source code
> that appears to adhere to all of these recommendations.
>
> The kernel has a fair amount of code written in high-level languages.
>

This is why we have crypto_memneq(), for instance, which is intended
to be time invariant, whereas the time taken by ordinary memcmp() is
typically correlated with the byte index of the first unequal byte. So
what we do there is compare every byte, instead of returning early on
the first mismatch. We do, however, perform the comparison in the
native word size and not byte by byte.

So if these optimizations result in word comparisons potentially
taking less time if the first byte is a mismatch, we definitely have a
problem. (This particular example may be far fetched but you get my
point)

> The authors of the DOIT doc truly intend the real-world benefits of
> DOITM to be exceedingly narrow.

I understand that this is the intent. But for privileged execution,
this should really be the other way around: the scope for
optimizations relying on data dependent timing is exceedingly narrow
in the kernel, because any data it processes must be assumed to be
confidential by default (wrt user space), and it will probably be
rather tricky to identify CPU bound workloads in the kernel where data
dependent optimizations are guaranteed to be safe and result in a
significant speedup.

This is basically the same argument I made for arm64.

> I think it would be fair to say that
> they think:
>
> DOITM is basically useless for most code written in C, including
> basically the entire kernel.
>
> I'll go forward this on to them and make sure I'm not overstating this
> _too_ much.
>

C code that was not specifically written with data independent timing
in mind may still behave that way today,
C code that *was* specifically written with data independent timing in
mind (such as crypto_memneq()) could potentially lose that property
under these optimizations.

> >> That's _meant_ to be really scary and keep folks from turning this on by
> >> default, aka. what this patch does. Your new CPU will be really slow if
> >> you turn this on! Boo!
> >
> > What is the penalty for switching it on and off? On arm64, it is now
> > on by default in the kernel, and off by default in user space, and
> > user space can opt into it using an unprivileged instruction.
>
> Right now, DOITM is controlled by a bit in an MSR and it applies
> everywhere. It is (thankfully) one of the cheap MSRs and is not
> architecturally serializing.
>
> That's still not ideal and there is a desire to expose the bit to
> userspace *somehow* to make it much, much cheaper to toggle. But, it'll
> still be an extra bit that needs to get managed and context switched.
>
> When I looked, the arm64 bit seemed to be in some flags register that
> got naturally saved and restored already on user<->kernel transitions.
> Was I reading it right? It seemed like a really nice, simple mechanism
> to me.
>

Indeed. It is part of PSTATE, which means is gets preserved/restored
along with the rest of the SPSR (saved program state) when an
exception is taken.