Re: [PATCH v5 22/34] x86/fred: FRED initialization code

From: andrew.cooper3@xxxxxxxxxx
Date: Mon Mar 20 2023 - 21:02:35 EST


On 21/03/2023 12:12 am, Li, Xin3 wrote:
>>> If there is no other concrete reason other than overflowing for
>>> assigning NMI and #DB with a stack level > 0, #VE should also be
>>> assigned with a stack level > 0, and #BP too. #VE can happen anytime
>>> and anywhere, so it is subject to overflowing too.
>> So #BP needs the stack-gap (redzone) for text_poke_bp().
>>
>> #BP can end up in kprobes which can then end up in ftrace/perf, depending on
>> how it's all wired up.
>>
>> #VE is currently a trainwreck vs NMI/MCE, but I think FRED solves the worst of
>> that. I'm not exactly sure how deep the #VE handler goes.
>>
> VE under IDT is *not* using an IST, we need some solid rationales here.

#VE, and #VC on AMD, are borderline unusable.  Both under IDT and FRED.

The reason #VE is not IST is because there are plenty of real cases
where a non-malicious outer hypervisor could create reentrant faults
that lose program state.  e.g. hitting an IO instruction, then hitting
an emulated MSR.

There are fewer cases where a non-IST #VE ends up in a re-entrant fault
(IIRC, you can still manage it by unmapping the entry stack), but you're
still trusting the outer hypervisor to not e.g. unmap the SYSCALL entry
point.

FRED gets rid of the "reentrant fault overwriting it on the stack" case,
and removes the syscall gap case, replacing them instead with a stack
overflow in the worst case because there is still no upper bound to how
many times #VE can actually be delivered in the course of servicing a
single #VE.

~Andrew

P.S. While I hate to cite myself, if you haven't read
https://docs.google.com/document/d/1hWejnyDkjRRAW-JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?usp=sharing
yet, do so.  It did feed into some of the FRED design.