Re: [PATCH v22 23/24] docs: x86/sgx: Document microarchitecture

From: Randy Dunlap
Date: Fri Sep 27 2019 - 14:15:33 EST


Hi,

doc edits for you:

On 9/3/19 7:26 AM, Jarkko Sakkinen wrote:
> From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
>
> Document microarchitectural features of Intel SGX relevant to the
> kernel.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx>
> ---
> Documentation/x86/index.rst | 1 +
> Documentation/x86/sgx/1.Architecture.rst | 431 +++++++++++++++++++++++
> Documentation/x86/sgx/index.rst | 16 +
> 3 files changed, 448 insertions(+)
> create mode 100644 Documentation/x86/sgx/1.Architecture.rst
> create mode 100644 Documentation/x86/sgx/index.rst


> diff --git a/Documentation/x86/sgx/1.Architecture.rst b/Documentation/x86/sgx/1.Architecture.rst
> new file mode 100644
> index 000000000000..a4de6c610231
> --- /dev/null
> +++ b/Documentation/x86/sgx/1.Architecture.rst
> @@ -0,0 +1,431 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============
> +Architecture
> +============
> +
> +Introduction
> +============
> +
> +SGX is a set of instructions and mechanisms that enable ring 3 applications to
> +set aside private regions of code and data for the purpose of establishing and
> +running enclaves. An enclave is a secure entity whose private memory can only
> +be accessed by code running within the enclave. Accesses from outside the
> +enclave, including software running at a higher privilege level and other
> +enclaves, are disallowed by hardware.
> +
> +SGX also provides for local and remote attestation. `Attestation`_ allows an
> +enclave to attest its identity, that it has not been tampered with, that it is
> +running on a genuine platform with Intel SGX enabled, and the security
> +properties of the platform on which it is running.
> +
> +You can determine if your CPU supports SGX by querying ``/proc/cpuinfo``:
> +
> + ``cat /proc/cpuinfo | grep sgx``
> +
> +
> +Enclave Page Cache
> +==================
> +
> +SGX utilizes an Enclave Page Cache (EPC) to store pages that are associated
> +with an enclave. The EPC is secure storage whose exact physical implementation
> +is micro-architecture specific (see `EPC Implemenations`_). Similar to normal
> +system memory, the EPC is managed by privileged software using conventional
> +paging mechanisms, e.g. the kernel can grant/deny access to EPC memory by
> +manipulating a process' page tables, and can swap pages in/out of the EPC in
> +order to oversubscribe the EPC.
> +
> +Unlikely regular memory, hardware prevents arbitrary insertion, eviction,
> +deletion, access, etc... to/from the EPC. Software must instead use dedicated
> +`SGX instructions`_ to operate on the EPC, which enables the processor to
> +provide SGX's security guarantees by enforcing various restrictions and
> +behaviors, e.g. limits concurrent accesses to EPC pages and ensures proper TLB
> +flushing when moving pages in/out of the EPC.
> +
> +Accesses to EPC pages are allowed if and only if the access is classified as an
> +"enclave access". There are two categories of allowed enclave accesses: direct
> +and indirect. Direct enclave accesses are generated if and only the processor

only if the

> +is executing in Enclave Mode (see `Enclave execution`_). Indirect enclave
> +accesses are generated by various ENCL{S,U,V} functions, many of which can be
> +executed outside of Enclave Mode.
> +
> +Non-enclave accesses to the EPC result in undefined behavior. Conversely,
> +enclave accesses to non-EPC memory result in a page fault (#PF)[1]_. Page
> +faults due to invalid enclave accesses set the PF_SGX flag (bit 15) in the page
> +fault error code[2]_.
> +
> +Although all EPC implementations will undoubtedly encrypt the EPC itself, all
> +all EPC code/data is stored unencrypted in the processor's caches. I.e. SGX

drop duplicate "all" above.

> +relies on the aforementioned mechanisms to protect an enclave's secrets while
> +they are resident in the cache.
> +
> +Note, EPC pages are always 4KB sized and aligned. Software can map EPC using
> +using large pages, but the processor always operates on a 4KB granularity when

drop duplicate "using" above.

> +working with EPC pages.
> +
> +
> +SGX instructions
> +================
> +
> +SGX introduces three new instructions, ENCLS, ENCLU and ENCLV, for Supervisor,
> +User and Virtualization respectively. ENCL{S,U,V} are umbrella instructions,
> +using a single opcode as the front end to a variety of SGX functions. The leaf
> +function to execute is specified via %eax, with %rbx, %rcx and %rdx optionally
> +used for leaf-specific purposes.
> +
> +Note that supervisor software, i.e. the kernel, creates and manages enclaves,
> +but only user-level software can execute/enter an enclave.
> +
> +ENCLS Leafs

Plural is usually Leaves. but I'll leave (no pun intended) that up to you.

> +-----------
> +
> + - ECREATE: create an enclave
> + - EADD: add page to an uninitialized enclave
> + - EAUG: add page to an initialized enclave
> + - EEXTEND: extended the measurement of an (uninitialized) enclave
> + - EINIT: verify and initialize enclave
> + - EDBG{RD,WR}: read/write from/to a debug enclaveâs memory
> + - EMODPR: restrict an EPC pageâs permissions
> + - EMODT: modify an EPC pageâs type
> + - EBLOCK: mark a page as blocked in EPCM
> + - ETRACK{C}: activate blocking tracing
> + - EWB: write back page from EPC to regular memory
> + - ELD{B,U}{C}: load page in {un}blocked state from system memory to EPC
> + - EPA: add version array (use to track evicted EPC pages)
> + - EREMOVE: remove a page from EPC
> + - ERDINFO: retrieve info about an EPC page from EPCM
> +
> +ENCLU Leafs
> +-----------
> + - EENTER: enter an enclave
> + - ERESUME: resume execution of an interrupted enclave
> + - EEXIT: exit an enclave
> + - EGETKEY: retrieve a cryptographic key from the processor
> + - EREPORT: generate a cryptographic report describing an enclave
> + - EMODPE: extend an EPC page's permissions
> + - EACCEPT: accept changes to an EPC page
> + - EACCEPTCOPY: copy an existing EPC page to an uninitialized EPC page
> +
> +ENCLV Leafs
> +-----------
> + - E{DEC,INC}VIRTCHILD: {dec,inc}rement SECS virtual refcount
> + - ESETCONTEXT: set SECSâ context pointer
> +
> +
> +EPC page types
> +==============
> +
> +All pages in the EPC have an explicit page type identifying the type of page.
> +The type of page affects the page's accessibility, concurrency requirements,
> +lifecycle, etc...
> +
> +SGX Enclave Control Structure (SECS)
> + An enclave is defined and referenced by an SGX Enclave Control Structure.
> + When creating an enclave (via ECREATE), software provides a source SECS for
> + the enclave, which is copied into a target EPC page. The source SECS
> + contains security and measurement information, as well as attributes and
> + properties of the enclave. Once the SECS is copied into the EPC, it's used
> + by the processor to store enclave metadata, e.g. the number of EPC pages
> + associated with the enclave, and is no longer directly accessible by
> + software.
> +
> +Regular (REG)
> + Regular EPC pages contain the code and data of an enclave. Code and data
> + pages can be added to an uninitialized enclave (prior to EINIT) via EADD.
> + Post EINIT, pages can be added to an enclave via EAUG. Pages added via
> + EAUG must be explicitly accepted by the enclave via EACCEPT or EACCEPTCOPY.
> +
> +Thread Control Structure (TCS)
> + Thread Control Structure pages define the entry points to an enclave and
> + track the execution state of an enclave thread. A TCS can only be used by
> + a single logical CPU at any given time, but otherwise has no attachment to
> + any particular logical CPU. Like regular pages, TCS pages are added to
> + enclaves via EADD and EINIT.

but not by EAUG? IOW, no changes to a TCS after EINIT?


> +
> +Version Array (VA)
> + Version Array pages contain 512 slots, each of which can contain a version
> + number for a page evicted from the EPC. A version number is a unique 8-byte
> + value that is fed into the MAC computation used to verify the contents of an

What is MAC? I don't see it mentioned anywhere else.

> + evicted page when reloading said page into the EPC. VA pages are the only
> + page type not directly associated with an enclave, and are allocated in the
> + EPC via EPA. Note that VA pages can also be evicted from the EPC, but
> + doing so requires another VA page/slot to hold the version number of the VA
> + page being evicted.
> +
> +Trim (TRIM)
> + The Trim page type indicates that a page has been trimmed from the enclaveâs
> + address space and is no longer accessible to enclave software, i.e. is about
> + to be removed from the enclave (via EREMOVE). Removing pages from a running
> + enclaves requires the enclave to explicit accept the removal (via EACCEPT).

explicitly

> + The intermediate Trim type allows software to batch deallocation operations
> + to improve efficiency, e.g. minimize transitions between userspace, enclave
> + and kernel.
> +
> +
> +Enclave Page Cache Map
> +======================
> +
> +The processor tracks EPC pages via the Enclave Page Cache Map (EPCM). The EPCM
> +is a processor-managed structure that enforces access restrictions to EPC pages
> +in addition to the software-managed page tables. The EPCM contains one entry
> +per EPC page, and although the details are implementation specific, all
> +implementations contain the following architectural information:
> +
> + - The status of EPC page with respect to validity and accessibility.
> + - An SECS identifier of the enclave to which the page belongs.
> + - The type of page: regular, SECS, TCS, VA or TRIM
> + - The linear address through which the enclave is allowed to access the page.
> + - The specified read/write/execute permissions on that page.
> +
> +Access violations, e.g. insufficient permissions or incorrect linear address,
> +detected via the EPCM result in a page fault (#PF)[1]_ exception being signaled
> +by the processor. Page faults due to EPCM violations set the PF_SGX flag
> +(bit 15) in the page fault error code[2]_.
> +
> +The EPCM is consulted if and only if walking the software-managed page tables,
> +i.e. the kernel's page tables, succeeds. I.e. the effective permissions for an
> +EPC page are a logical AND of the kernel's page tables and the corresponding
> +EPCM entry. This allows the kernel to make its page tables more restrictive
> +without triggering an EPCM violation, e.g. it may mark an entry as not-present
> +prior to evicting a page from the EPC.
> +
> +**IMPORTANT** For all intents and purposes the SGX architecture allows the
> +processor to invalidate all EPCM entries at will, i.e. requires that software
> +be prepared to handle an EPCM fault at any time. Most processors are expected
> +to implement the EPC{M} as a subset of system DRAM that is encrypted with an
> +ephemeral key, i.e. a key that is randomly generated at processor reset. As a
> +result of using an ephemeral key, the contents of the EPC{M} are lost when the
> +processor is powered down as part of an S3 transition or when a virtual machine
> +is live migrated to a new physical system.
> +
> +
> +Enclave initialization
> +======================
> +
> +Because software cannot directly access the EPC except when executing in an
> +enclave, an enclave must be built using ENCLS functions (ECREATE and EADD) as
> +opposed to simply copying the enclave from the filesystem to memory. Once an
> +enclave is built, it must be initialized (via EINIT) before userspace can enter
> +the enclave and begin `Enclave execution`_.
> +
> +During the enclave build process, two "measurements", i.e. SHA-256 hashes, are
> +taken of the enclave: MRENCLAVE and MRSIGNER. MRENCLAVE measures the enclave's
> +contents, e.g. code/data explicitly added to the measurement (via EEXTEND), as
> +well as metadata from the enclave's build process, e.g. pages offsets (relative
> +to the enclave's base) and page permissions of all pages added to the enclave
> +(via EADD). MRENCLAVE is initialized by ECREATE and finalized by EINIT.
> +MRSIGNER is simply the SHA-256 hash of the public key used to sign the enclave.
> +
> +EINIT accepts two parameters in addition to the SECS of the target enclave: an
> +Enclave Signature Struct (SIGSTRUCT) and an EINIT token (EINITTOKEN).
> +SIGSTRUCT is a structure created and signed by the enclave's developer. Among
> +other fields, SIGSTRUCT contains the expected MRENCLAVE of the enclave and the
> +MRSIGNER of the enclave. SIGSTRUCT's MRENCLAVE is used by the processor to
> +verify that the enclave was properly built (at runtime), and its SIGSTRUCT is
> +copied to the SECS upon successful EINIT. EINITTOKEN is an optional parameter
> +that is consumed as part of `Launch Control`_.
> +
> +
> +Enclave execution
> +=================
> +
> +Enclaves execute in a bespoke sub-mode of ring 3, appropriately named Enclave
> +Mode. Enclave Mode changes behavior in key ways to support SGX's security
> +guarantees and to reduce the probability of unintentional disclosure of
> +sensitive data.
> +
> +A notable cornerstone of Enclave Mode is the Enclave Linear Range (ELRANGE).
> +An enclave is associated with one, and only one, contiguous linear address
> +range, its ELRANGE. The ELRANGE is specified via the SIZE and BASEADDR fields
> +in the SECS (provided to ECREATE). The processor queries the active enclave's
> +ELRANGE to differentiate enclave and non-enclave accesses, i.e. accesses that
> +originate in Enclave Mode *and* whose linear address falls within ELRANGE are
> +considered (direct) enclave accesses. Note, the processor also generates
> +(indirect) enclave accesses when executing ENCL* instructions, which may occur
> +outside of Enclave Mode, e.g. when copying the SECS to its target EPC page
> +during ECREATE.
> +
> +Enclave Mode changes include, but are not limited to:
> +
> + - Permits direct software access to EPC pages owned by the enclave
> + - Ensures enclave accesses map to the EPC (EPCM violation, i.e. #PF w/ PF_SGX)
> + - Prevents executing code outside the enclave's ELRANGE (#GP fault)
> + - Changes the behavior of exceptions/events
> + - Causes many instructions to become illegal, i.e. generate an exception
> + - Supresses all instruction breakpoints*

SUppresses

> + - Suppresses data breakpoints within enclave's ELRANGE*
> +
> + * For non-debug enclaves.
> +
> +Transitions to/from Enclave Mode have semantics that are a lovely blend of
> +SYSCALL, SYSRET and VM-Exit. In normal execution, entering and exiting Enclave
> +Mode can only be done through EENTER and EEXIT respectively. EENTER+EEXIT is
> +analogous to SYSCALL+SYSRET, e.g. EENTER/SYSCALL load RCX with the next RIP and
> +EEXIT/SYSRET load RIP from R{B,C}X, and EENTER can only jump to a predefined
> +location controlled by the enclave/kernel.
> +
> +But when an exception, interrupt, VM-Exit, etc... occurs, enclave transitions

etc. occurs,

> +behave more like VM-Exit and VMRESUME. To maintain the black box nature of the
> +enclave, the processor automatically switches register context when any of the
> +aforementioned events occur (the SDM refers to such events as Enclave Exiting
> +Events (EEE)).
> +
> +To handle an EEE, the processor performs an Asynchronous Enclave Exits (AEX).

Exit {?}

> +Note, although exceptions and traps are synchronous from a processor execution
> +perspective, the are asynchronous from the enclave's perspective as the enclave
> +is not provided an opportunity to save/fuzz state prior to exiting the enclave.
> +On an AEX, the processor exits the enclave to a predefined %rip called the
> +Asynchronous Exiting Pointer (AEP). The AEP is specified at enclave entry (via
> +EENTER/ERESUME) and saved into the associated TCS, similar to how a hypervisor
> +specifies the VM-Exit target (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME), i.e. the
> +the AEP is an exit location controlled by the enclave's untrusted runtime.
> +
> +On an AEX, the processor fully exits the enclave prior to vectoring the event,
> +i.e. from the event handler's perspective the event occurred at the AEP. Thus,
> +IRET/RSM/VMRESUME (from the event handler) returns control to the enclave's
> +untrusted runtime, which can take appropriate action, e.g. immediately ERESUME
> +the enclave on interrupts, forward expected exceptions to the enclave, restart
> +the enclave on fatal exceptions, and so on and so forth.
> +
> +To preserve the enclave's state across AEX events, the processor automatically
> +saves architectural into a State Save Area (SSA). Because SGX supports nested

saves architectural state into

> +AEX events, e.g. the untrusted runtime can re-EENTER the enclave after an AEX,
> +which can in turn trigger an AEX, the TCS holds a pointer to a stack of SSA
> +frames (as opposed to a single SSA), an index to the current SSA frame and the
> +total number of available frames. When an AEX occurs, the processor saves the
> +architectural state into the TCS's current SSA frame. The untrusted runtime
> +can then pop the last SSA frame (off the TCS's stack) via ERESUME, i.e. restart
> +the enclave after the AEX is handled.
> +
> +
> +Launch Control
> +==============
> +
> +SGX provides a set of controls, referred to as Launch Control, that governs the
> +initialization of enclaves. The processor internally stores a SHA-256 hash of
> +a 3072-bit RSA public key, i.e. a MRSIGNER, often referred to as the "LE pubkey
> +hash". The LE pubkey hash is used during EINIT to prevent launching an enclave
> +without proper authorization. In order for EINIT to succeed, the enclave's
> +MRSIGNER (from SIGSTRUCT) *or* the MRSIGNER of the enclave's EINITTOKEN must
> +match the LE pubkey hash.
> +
> +An EINITTOKEN can only be created by a so called Launch Enclave (LE). A LE is

so-called

> +an enclave with SECS.ATTRIBUTES.EINITTOKEN_KEY=1, which grants it access to the
> +EINITTOKEN_KEY (retrieved via EGETKEY). EINITTOKENs provide a ready-built
> +mechanism for userspace to bless enclaves without requiring additional kernel
> +infrastructure.
> +
> +Processors that support SGX Launch Control Configuration, enumerated by the
> +SGX_LC flag (bit 30 in CPUID 0x7.0x0.ECX), expose the LE pubkey hash as a set
> +of four MSRs, aptly named IA32_SGXLEPUBKEYHASH[0-3]. The reset value of the
> +MSRs is an internally defined (Intel) key (processors that don't support
> +SGX_LC also use an internally defined key, it's just not exposed to software).
> +
> +While the IA32_SGXLEPUBKEYHASH MSRs are readable on any platform that supports
> +SGX_LC, the MSRs are only writable if the IA32_FEATURE_CONTROL is locked with
> +bit 17 ("SGX Launch Control Enable" per the SDM, or more accurately "SGX LE
> +pubkey hash writable") set to '1'. Note, the MSRs are also writable prior to
> +`SGX activation`_.
> +
> +Note, while "Launch Control Configuration" is the official feature name used by
> +the Intel SDM, other documentation may use the term "Flexible Launch Control",
> +or even simply "Launch Control". Colloquially, the vast majority of usage of
> +the term "Launch Control" is synonymous with "Launch Control Configuration".
> +
> +
> +EPC oversubscription
> +====================
> +
> +SGX supports the concept of EPC oversubscription. Analogous to swapping system
> +DRAM to disk, enclave pages can be swapped from the EPC to memory, and later
> +reloaded from memory to the EPC. But because the kernel is untrusted, swapping
> +pages in/out of the EPC has specialized requirements:
> +
> + - The kernel cannot directly access EPC memory, i.e. cannot copy data to/from
> + the EPC.
> + - The kernel must "prove" to hardware that there are no valid TLB entries for
> + said page prior to eviction (a stale TLB entry would allow an attacker to
> + bypass SGX access controls).
> + - When loading a page back into the EPC, hardware must be able to verify
> + the integrity and freshness of the data.
> + - When loading an enclave page, e.g. regular and TCS pages, hardware must be
> + able to associate the page with an SECS, i.e. refcount an enclaves pages.

enclave's

> +
> +To satisfy the above requirements, the CPU provides dedicated ENCLS functions
> +to support paging data in/out of the EPC:
> +
> + - EBLOCK: Mark a page as blocked in the EPC Map (EPCM). Attempting to access
> + a blocked page that misses the TLB will fault.
> + - ETRACK: Activate TLB tracking. Hardware verifies that all translations for
> + pages marked as "blocked" have been flushed from the TLB.
> + - EPA: Add Version Array page to the EPC (see `EPC page types`_)
> + - EWB: Write back a page from EPC to memory, e.g. RAM. Software must
> + supply a VA slot, memory to hold the Paging Crypto Metadata (PCMD) of the
> + page and obviously backing for the evicted page.
> + - ELD*: Load a page in {un}blocked state from memory to EPC.
> +
> +Swapped EPC pages are {de,en}crypted on their way in/out of the EPC, e.g. EWB
> +encrypts and ELDU decrypts. The version number (stored in a VA page) and PCMD
> +structure associated with an evicted EPC page seal a page (prevent undetected
> +modification) and ensure its freshness (prevent rollback to a stale version of
> +the page) while the page resides in unprotected storage, e.g. memory or disk.
> +
> +
> +Attestation
> +===========
> +
> +SGX provides mechanisms that allow software to implement what Intel refers to
> +as Local Attestation (used by enclaves running on a the same physical platform
> +to securely identify one another) and Remote Attestation (a process by which an
> +enclave attests itself to a remote entity in order to gain the trust of said
> +entity).
> +
> +The details of Local Attestation and Remote Attestation are far beyond the
> +scope of this document. Please see Intel's Software Developer's Manual and/or
> +use your search engine of choice to learn more about SGX's attestation
> +capabilities.
> +
> +
> +EPC Implemenations
> +==================
> +
> +PRM with MEE
> +--------------

wrong length underline!

> +
> +Initial hardware support for SGX implements the EPC by reserving a chunk of
> +system DRAM, referred to as Processor Reserved Memory (PRM). A percentage of
> +PRM is consumed by the processor to implement the EPCM, with the remainder of
> +PRM being exposed to software as the EPC. PRM is configured by firmware via
> +dedicated PRM Range Registers (PRMRRs). The PRMRRs are locked which are locked as part of SGX activation, i.e.

confusing. "are locked which are locked"

> +resizing the PRM, and thus EPC, requires rebooting the system.
> +
> +An autonomous hardware unit called the Memory Encryption Engine (MEE) protects
> +the confidentiality, integrity, and freshness of the PRM, e.g. {de,en}crypts
> +data as it is read/written from/to DRAM to provide confidentiality.
> +
> +
> +SGX activation
> +==============
> +
> +Before SGX can be fully enabled, e.g. via FEATURE_CONTROL, the platform must
> +undergo explicit SGX activation. SGX activation is a mechanism by which the
> +processor verifies and locks the platform configuration set by pre-boot
> +firmware, e.g. to ensure it satisfies SGX's security requirements. Before
> +SGX is activated (and its configuration locked), firmware can modify the
> +PRMRRs, e.g. to set the base/size of the PRM and thus EPC, and can also write
> +the SGX_LEPUBKEYHASH MSRs. Notably, the latter allows pre-boot firmware to
> +lock the SGX_LEPUBKEYHASH MSRs to a non-Intel value by writing the MSRs and
> +locking MSR_IA32_FEATURE_CONTROL without setting the "SGX LE pubkey hash
> +writable" flag, i.e. making the SGX_LEPUBKEYHASH MSRs readonly.
> +
> +
> +Footnotes
> +=========
> +
> +.. [1] All processors that do not support the SGX2 ISA take an errata and
> + signal #GP(0) instead of #PF(PF_SGX) when vectoring EPCM violations and
> + faults due to enclave-accesses to non-EPC memory.
> +
> +.. [2] Note that despite being vectored as a #PF, a #PF with PF_SGX has nothing
> + to do with conventional paging.
> +

--
~Randy