Re: [RFC PATCH 00/47] Address Space Isolation for KVM

From: junaid_shahid
Date: Sun Apr 10 2022 - 23:27:21 EST


Hi Alex,

> On 3/23/22 20:35, Junaid Shahid wrote:
>> On 3/22/22 02:46, Alexandre Chartre wrote:
>>>
>>> So if I understand correctly, you have following sequence:
>>>
>>> 0 - Initially state is set to "stunned" for all cpus (i.e. a cpu
>>> should wait before VMEnter)
>>>
>>> 1 - After ASI Enter: Set sibling state to "unstunned" (i.e. sibling
>>> can do VMEnter)
>>>
>>> 2 - Before VMEnter : wait while my state is "stunned"
>>>
>>> 3 - Before ASI Exit : Set sibling state to "stunned" (i.e. sibling
>>> should wait before VMEnter)
>>>
>>> I have tried this kind of implementation, and the problem is with
>>> step 2 (wait while my state is "stunned"); how do you wait exactly?
>>> You can't just do an active wait otherwise you have all kind of
>>> problems (depending if you have interrupts enabled or not)
>>> especially as you don't know how long you have to wait for (this
>>> depends on what the other cpu is doing).
>>
>> In our stunning implementation, we do an active wait with interrupts
>> enabled and with a need_resched() check to decide when to bail out
>> to the scheduler (plus we also make sure that we re-enter ASI at the
>> end of the wait in case some interrupt exited ASI). What kind of
>> problems have you run into with an active wait, besides wasted CPU
>> cycles?
>
> If you wait with interrupts enabled then there is window after the
> wait and before interrupts get disabled where a cpu can get an interrupt,
> exit ASI while the sibling is entering the VM.

We actually do another check after disabling interrupts and if it turns out
that we need to wait again, we just go back to the wait loop after re-enabling
interrupts. But, irrespective of that,

> Also after a CPU has passed
> the wait and have disable interrupts, it can't be notified if the sibling
> has exited ASI:

I don't think that this is actually the case. Yes, the IPI from the sibling
will be blocked while the host kernel has disabled interrupts. However, when
the host kernel executes a VMENTER, if there is a pending IPI, the VM will
immediately exit back to the host even before executing any guest code. So
AFAICT there is not going to be any data leak in the scenario that you
mentioned. Basically, the "cpu B runs VM" in step T+06 won't actually happen.

>
> T+01 - cpu A and B enter ASI - interrupts are enabled
> T+02 - cpu A and B pass the wait because both are using ASI - interrupts are enabled
> T+03 - cpu A gets an interrupt
> T+04 - cpu B disables interrupts
> T+05 - cpu A exit ASI and process interrupts
> T+06 - cpu B enters VM => cpu B runs VM while cpu A is not using ASI
> T+07 - cpu B exits VM
> T+08 - cpu B exits ASI
> T+09 - cpu A returns from interrupt
> T+10 - cpu A disables interrupts and enter VM => cpu A runs VM while cpu A is not using ASI

The "cpu A runs VM while cpu A is not using ASI" will also not happen, because
cpu A will re-enter ASI after disabling interrupts and before entering the VM.

>
>> In any case, the specific stunning mechanism is orthogonal to ASI.
>> This implementation of ASI can be integrated with different stunning
>> implementations. The "kernel core scheduling" that you proposed is
>> also an alternative to stunning and could be similarly integrated
>> with ASI.
>
> Yes, but for ASI to be relevant with KVM to prevent data leak, you need
> a fully functional and reliable stunning mechanism, otherwise ASI is
> useless. That's why I think it is better to first focus on having an
> effective stunning mechanism and then implement ASI.
>

Sure, that makes sense. The only caveat is that, at least in our testing, the
overhead of stunning alone without ASI seemed too high. But I can try to see
if we might be able to post our stunning implementation with the next version
of the RFC.

Thanks,
Junaid

PS: I am away from the office for a few weeks, so email replies may be delayed
until next month.