Re: [PATCHv7 10/14] x86/mm: Avoid load_unaligned_zeropad() stepping into unaccepted memory

From: Dave Hansen
Date: Tue Aug 02 2022 - 19:46:44 EST


On 7/26/22 03:21, Borislav Petkov wrote:
> On Tue, Jun 14, 2022 at 03:02:27PM +0300, Kirill A. Shutemov wrote:
>> But, this approach does not work for unaccepted memory. For TDX, a load
>> from unaccepted memory will not lead to a recoverable exception within
>> the guest. The guest will exit to the VMM where the only recourse is to
>> terminate the guest.
> FTR, this random-memory-access-to-unaccepted-memory-is-deadly thing is
> really silly. We should be able to handle such cases - because they do
> happen often - in a more resilient way. Just look at the complex dance
> this patch needs to do just to avoid this.
>
> IOW, this part of the coco technology needs improvement.

This particular wound is self-inflicted. The hardware can *today*
generate a #VE for these accesses. But, to make writing the #VE code
more straightforward, we asked that the hardware not even bother
delivering the exception. At the time, nobody could come up with a case
why there would ever be a legitimate, non-buggy access to unaccepted memory.

We learned about load_unaligned_zeropad() the hard way. I never ran
into it and never knew it was there. Dangit.

We _could_ go back to the way it was originally. We could add
load_unaligned_zeropad() support to the #VE handler, and there's little
risk of load_unaligned_zeropad() itself being used in the
interrupts-disabled window early in the #VE handler. That would get rid
of all the nasty adjacent page handling in the unaccepted memory code.

But, that would mean that we can land in the #VE handler from more
contexts. Any normal, non-buggy use of load_unaligned_zeropad() can end
up there, obviously. We would, for instance, need to be more careful
about #VE recursion. We'd also have to make sure that _bugs_ that land
in the #VE handler can still be handled in a sane way.

To sum it all up, I'm not happy with the complexity of the page
acceptance code either but I'm not sure that it's bad tradeoff compared
to greater #VE complexity or fragility.

Does anyone think we should go back and really reconsider this?