Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

From: Dave Hansen
Date: Tue May 17 2022 - 18:16:48 EST


On 5/17/22 13:17, Kirill A. Shutemov wrote:
>>> Given that we had to adjust IP in handle_mmio() anyway, do you still think
>>> "ve->instr_len = 0;" is wrong? I dislike ip_adjusted more.
>> Something is wrong about it.
>>
>> You could call it 've->instr_bytes_to_handle' or something. Then it
>> makes actual logical sense when you handle it to zero it out. I just
>> want it to be more explicit when the upper levels need to do something.
>>
>> Does ve->instr_len==0 both when the TDX module isn't providing
>> instruction sizes *and* when no handling is necessary? That seems like
>> an unfortunate logical multiplexing of 0.
> For EPT violation, ve->instr_len has *something* (not zero) that doesn't
> match the actual instruction size. I dig out that it is filled with data
> from VMREAD(0x440C), but I don't know where is the ultimate origin of the
> data.

The SDM has a breakdown:

27.2.5 Information for VM Exits Due to Instruction Execution

I didn't realize it came from VMREAD. I guess I assumed it came from
some TDX module magic. Silly me.

The SDM makes it sound like we should be more judicious about using
've->instr_len' though. "All VM exits other than those listed in the
above items leave this field undefined." Looking over
virt_exception_kernel(), we've got five cases from CPU instructions that
cause unconditional VMEXITs:

case EXIT_REASON_HLT:
case EXIT_REASON_MSR_READ:
case EXIT_REASON_MSR_WRITE:
case EXIT_REASON_CPUID:
case EXIT_REASON_IO_INSTRUCTION:

and should have that field filled out, plus one that doesn't:

case EXIT_REASON_IO_INSTRUCTION:

It seems awfully fragile to me to have the hardware be providing the
'instr_len' in those cases, but not in one other one. The data in there
is garbage for EXIT_REASON_IO_INSTRUCTION. The reason we don't consume
garbage is that all the paths leading out of handle_mmio() that return
true also set 've->instr_len'. But that logic is entirely opaque.

It's also borderline criminal to have six functions that look identical
(in that switch statement), but one of them has different behavior for
've->instr_len'.

I'd probably do it like this:

static int handle_halt(struct ve_info *ve)
{
/*
* Since non safe halt is mainly used in CPU offlining
* and the guest will always stay in the halt state, don't
* call the STI instruction (set do_sti as false).
*/
const bool irq_disabled = irqs_disabled();
const bool do_sti = false;

if (__halt(irq_disabled, do_sti))
return -EIO;

/*
* VM-exit instruction length is defined for HLT. See:
* "Information for VM Exits Due to Instruction Execution"
* in the SDM.
*/
return ve->insn_length;
}

Any >=0 return means the exception was handled and it tells the caller
hoe much to advance RIP.

Then handle_mmio() can say:

/*
* VM-exit instruction length is not provided for the EPT
* violations that MMIO causes. Use the insn_decode() length:
*/
return insn.length;

See? Now everybody that goes and writes a new #VE exception helper has
a chance of actually getting this right. As it stands, if someone adds
one more of these, they'll probably get random behavior. This way, they
actually have to choose. They _might_ even go looking at the SDM.