Re: [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel

From: Kuppuswamy, Sathyanarayanan
Date: Wed Aug 04 2021 - 18:23:14 EST




On 8/4/21 2:48 PM, Dave Hansen wrote:
No, #GP is triggered by guest.
...
Regardless of #GP versus #VE, "Table 16.2 MSR Virtualization" needs
to state the actual behavior.
Even in this case, it will trigger #VE. But since CSTAR MSR is not
supported, write to it will fail and leads to #VE fault.
Sathya, I think there might be a mixup of terminology here that's
confusing. I'm confused by this exchange.

In general, we refer to hardware exceptions by their architecture names:
#GP for general protection fault, #PF for page fault, #VE for
Virtualization Exception.

Those hardware exceptions are wired up to software handlers:
#GP lands in asm_exc_general_protection
#PF ends up in exc_page_fault
#VE ends up in exc_virtualization_exception
... and more of course

But, to add to the confusion, the #VE handler
(exc_virtualization_exception()) itself calls (or did once upon a time
call) do_general_protection() when it can't handle something.
do_general_protection() is (was?)*ALSO* called by the #GP handler.

So, is that what you meant? By "#GP is triggered by guest", you mean
that a write to the CSTAR MSR and the resulting #VE will end up being
handled in a way that is similar to how a #GP hardware exception would
have been handled?

If that's what you meant, I'm not_sure_ that's totally accurate. Could
you elaborate on this a bit? It also would be really handy if you were
able to adopt the terminology I talked about above. It will really make
things less confusing.


In TDX guest, MSR write will trigger #VE which will be handled by
exc_virtualization_exception()->tdg_handle_virtualization_exception().
Internally this exception handler emulates the "MSR write" using
hypercalls. But if the hypercall returns failure, then it means we
failed to handle the #VE exception. In such cases,
exc_virtualization_exception() handler will trigger #GP like behavior
using ve_raise_fault(). ve_raise_fault() is the customized version of
do_general_protection(). This what I meant by guest triggers #GP(0).

Since CSTAR_MSR is not supported/used in Intel platforms, instead of
going through all these processes before triggering the failure, we
have added the exception for it before it is used.

Following are the implementation details:

static void ve_raise_fault(struct pt_regs *regs, long error_code)
{
struct task_struct *tsk = current;

if (user_mode(regs)) {
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_VE;

/*
* Not fixing up VDSO exceptions similar to #GP handler
* because we don't expect the VDSO to trigger #VE.
*/
show_signal(tsk, SIGSEGV, "", VEFSTR, regs, error_code);
force_sig(SIGSEGV);
return;
}

if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
return;

tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_VE;

/*
* To be potentially processing a kprobe fault and to trust the result
* from kprobe_running(), we have to be non-preemptible.
*/
if (!preemptible() &&
kprobe_running() &&
kprobe_fault_handler(regs, X86_TRAP_VE))
return;

notify_die(DIE_GPF, VEFSTR, regs, error_code, X86_TRAP_VE, SIGSEGV);

die_addr(VEFSTR, regs, error_code, 0);
}


DEFINE_IDTENTRY(exc_virtualization_exception)
{
struct ve_info ve;
int ret;

RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");

inc_irq_stat(tdg_ve_count);

/*
* NMIs/Machine-checks/Interrupts will be in a disabled state
* till TDGETVEINFO TDCALL is executed. This prevents #VE
* nesting issue.
*/
ret = tdg_get_ve_info(&ve);

cond_local_irq_enable(regs);

if (!ret)
ret = tdg_handle_virtualization_exception(regs, &ve);
/*
* If tdg_handle_virtualization_exception() could not process
* it successfully, treat it as #GP(0) and handle it.
*/
if (ret)
ve_raise_fault(regs, 0);

cond_local_irq_disable(regs);

}
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer