Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow

From: Sean Christopherson
Date: Thu Jan 20 2022 - 12:42:43 EST


On Thu, Jan 20, 2022, Andrew Cooper wrote:
> On 20/01/2022 16:36, Sean Christopherson wrote:
> > On Thu, Jan 20, 2022, Andrew Cooper wrote:
> >> On 20/01/2022 00:06, Sean Christopherson wrote:
> >>> MOVSS blocking can be initiated by userspace, but can be coincident with
> >>> a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
> >>> executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking
> >>> is problematic only for CPL0 (and only if the guest is crazy enough to
> >>> access a DR in a MOVSS shadow). All other sources of #DBs are either
> >>> suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
> >> It is more complicated than this and undocumented.  Single step is
> >> discard in a shadow, while data breakpoints are deferred.
> > But for the purposes of making the consitency check happy, whether they are
> > deferred or dropped should be irrelevant, no?
>
> From that point of view, yes.  The consistency check is specific to TS. 
> I suppose I was mostly questioning the wording of the explanation.
>
> >>> are mutually exclusive with MOVSS blocking (T-bit task switch),
> >> Howso?  MovSS prevents external interrupts from triggering task
> >> switches, but instruction sources still trigger in a shadow.
> > T-bit #DBs are traps, and arrive after the task switch has completed. The switch
> > can be initiated in the shadow, but the #DB will be delivered after the instruction
> > retires and so after MOVSS blocking goes away. Or am I missing something?
>
> Well - this is where the pipeline RTL is needed, in lieu of anything
> better.  Trap-style #DBs are part of the current instruction, and
> specifically ahead (in the instruction cycle) of the subsequent intchk.

And T-bit traps in particular have crazy high priority...

> There are implementations where NMI/INTR/etc won't be delivered at the
> head of an exception generated in a shadow, which would suggest that
> these implementations have the falling edge of the shadow after intchk
> on the instruction boundary.  (Probably certainly what happens is that
> intchk is responsible for clearing the shadow, but this is entirely
> guesswork on my behalf.)

Well, thankfully hardware's behavior should be moot for VM-Entry since task switches
unconditionally VM-Exit, and KVM has a big fat TODO for handling the T-bit.

> >> and splitlock which is new since I last thought about this problem.
> > Eww. Split Lock is trap-like, which begs the question of what happens if the
> > MOV/POP SS splits a cache line when loading the source data. I'm guess it's
> > suppressed, a la data breakpoints, but that'd be a fun one to test.
>
> They're both reads of their memory operand, so aren't eligible to be
> locked accesses.

Hah, right, the "lock" part of "split lock" is just a minor detail...

> However, a devious kernel can misalign the GDT/LDT such that setting the
> descriptor access bit does trigger a splitlock.  I suppose "kernel
> doesn't misalign structures", or "kernel doesn't write a descriptor with
> the access bit clear" are both valid mitigations.
>
> >>> This bug was originally found by running tests[1] created for XSA-308[2].
> >>> Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
> >>> presumably why the Xen bug was deemed to be an exploitable DOS from guest
> >>> userspace.
> >> As I recall, the original report to the security team was something
> >> along the lines of "Steam has just updated game, and now when I start
> >> it, the VM explodes".
> > Lovely. I wonder if the game added some form of anti-cheat? I don't suppose you
> > have disassembly from the report? I'm super curious what on earth a game would
> > do to trigger this.
>
> Anti-cheat was my guess too, but no disassembly happened.
>
> I was already aware of the STI issue, and had posted
> https://lore.kernel.org/xen-devel/1528120755-17455-11-git-send-email-andrew.cooper3@xxxxxxxxxx/
> more than a year previously.  The security report showed ICEBP pending
> in the INTR_INFO field, and extending the STI test case in light of this
> was all of 30s of work to get a working repro.
>
> ~Andrew