Re: [PATCH 2/2] selftests/sgx: Add SGX selftest augment_via_eaccept_long
From: Reinette Chatre
Date: Wed Aug 17 2022 - 00:35:41 EST
Hi Vijay,
On 8/16/2022 6:27 PM, Dhanraj, Vijay wrote:
> Hi Jarkko, Reinette,
>
>> -----Original Message-----
>> From: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
>> Sent: Tuesday, August 16, 2022 4:34 PM
>> To: Chatre, Reinette <reinette.chatre@xxxxxxxxx>
>> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>; linux-
>> sgx@xxxxxxxxxxxxxxx; Dhanraj, Vijay <vijay.dhanraj@xxxxxxxxx>; Shuah Khan
>> <shuah@xxxxxxxxxx>; open list:KERNEL SELFTEST FRAMEWORK <linux-
>> kselftest@xxxxxxxxxxxxxxx>; open list <linux-kernel@xxxxxxxxxxxxxxx>
>> Subject: Re: [PATCH 2/2] selftests/sgx: Add SGX selftest
>> augment_via_eaccept_long
>>
>> On Tue, Aug 16, 2022 at 09:26:40AM -0700, Reinette Chatre wrote:
>>> Hi Vijay,
>>>
>>> Thank you very much for digging into this. A few comments below.
>>>
>>> On 8/15/2022 4:39 PM, Jarkko Sakkinen wrote:
...
>>>> @@ -25,6 +25,8 @@ static const uint64_t MAGIC =
>>>> 0x1122334455667788ULL; static const uint64_t MAGIC2 =
>>>> 0x8877665544332211ULL; vdso_sgx_enter_enclave_t
>>>> vdso_sgx_enter_enclave;
>>>>
>>>> +static const unsigned long edmm_size = 8589934592; //8G
>>>> +
>>>
>>> Could you please elaborate how this constant was chosen? I understand
>>> that this test helped to uncover a bug and it is useful to add to the
>>> kernel. When doing so this test will be run on systems with a variety
>>> of SGX memory sizes, could you please elaborate (and add a
>>> snippet) how 8GB is the right value for all systems?
>>
>> It is the only constant I know for sure that some people (Vijay and Haitao)
>> have been able to reproduce the bug.
>>
>> Unless someone can show that the same bug reproduces with a smaller
>> constant, changing it would make the whole test irrelevant.
>
> I tried with 2GB and it always succeed and with 4GB was able to repro sporadically. But with 8GB failure was consistent. One thing to note is even with 8GB Haitao couldn't reproduce this every time. So not sure if it good for all the systems but on my ICX system, I was able to consistently repro with this value.
>
Could all of this information be placed in a description of this constant? At this time
it appears to be arbitrary.
>>>> +
>>>> + if (!sgx2_supported())
>>>> + SKIP(return, "SGX2 not supported");
>>>> +
>>>> + ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self-
>>> encl,
>>>> +_metadata));
>>>> +
>>>> + memset(&self->run, 0, sizeof(self->run));
>>>> + self->run.tcs = self->encl.encl_base;
>>>> +
>>>> + for (i = 0; i < self->encl.nr_segments; i++) {
>>>> + struct encl_segment *seg = &self->encl.segment_tbl[i];
>>>> +
>>>> + total_size += seg->size;
>>>> + TH_LOG("test enclave: total_size = %ld, seg->size = %ld",
>> total_size, seg->size);
>>>> + }
>>>> +
>>>> + /*
>>>> + * Actual enclave size is expected to be larger than the loaded
>>>> + * test enclave since enclave size must be a power of 2 in bytes while
>>>> + * test_encl does not consume it all.
>>>> + */
>>>> + EXPECT_LT(total_size + edmm_size, self->encl.encl_size);
>>>
>>> Will this test ever fail?
>>
>> With a *quick* look: no.
>>
>> Vijay, what was the point of this check?
>
> Yes we can remove this check. I tried to copy from `augment_via_eaccept` and just changed the request size.
>
In augment_via_eaccept the check is required since augment_via_eaccept assumes
that there is enough address space in the existing enclave for dynamic memory addition
without needing to change the enclave size. If anybody later changes the test
enclave to break this assumption then that check will pick it up.
In this new test the enclave size is set to accommodate the planned
dynamic memory addition and thus adding a test to check if the enclave
has enough space for the dynamic memory is not needed.
>>>> + TH_LOG("Entering enclave to run EACCEPT for each page of %zd
>> bytes may take a while ...",
>>>> + edmm_size);
>>>> + eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W |
>> SGX_SECINFO_REG | SGX_SECINFO_PENDING;
>>>> + eaccept_op.ret = 0;
>>>> + eaccept_op.header.type = ENCL_OP_EACCEPT;
>>>> +
>>>> + for (i = 0; i < edmm_size; i += 4096) {
>>>> + eaccept_op.epc_addr = (uint64_t)(addr + i);
>>>> +
>>>> + EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
>>>> + if (self->run.exception_vector == 14 &&
>>>> + self->run.exception_error_code == 4 &&
>>>> + self->run.exception_addr == self->encl.encl_base) {
>>>> + munmap(addr, edmm_size);
>>>> + SKIP(return, "Kernel does not support adding pages
>> to initialized enclave");
>>>> + }
>>>> +
>>>> + EXPECT_EQ(self->run.exception_vector, 0);
>>>> + EXPECT_EQ(self->run.exception_error_code, 0);
>>>> + EXPECT_EQ(self->run.exception_addr, 0);
>>>> + ASSERT_EQ(eaccept_op.ret, 0);
>>>> + ASSERT_EQ(self->run.function, EEXIT);
>>>> + }
>>>> +
>>>> + /*
>>>> + * New page should be accessible from within enclave - attempt to
>>>> + * write to it.
>>>> + */
>>>
>>> This portion below was also copied from previous test and by only
>>> testing a write to the first page of the range the purpose is not
>>> clear. Could you please elaborate if the intention is to only test
>>> accessibility of the first page and why that is sufficient?
>>
>> It is sufficient because the test reproduces the bug. It would have to be
>> rather elaborated why you would possibly want to do more than that.
That is fair. An accurate comment (currently an inaccurate copy&paste) would
help to explain this part of the test.
>>>> + put_addr_op.value = MAGIC;
>>>> + put_addr_op.addr = (unsigned long)addr;
>>>> + put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
>>>> +
>>>> + EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
>>>> +
>>>> + EXPECT_EEXIT(&self->run);
>>>> + EXPECT_EQ(self->run.exception_vector, 0);
>>>> + EXPECT_EQ(self->run.exception_error_code, 0);
>>>> + EXPECT_EQ(self->run.exception_addr, 0);
>>>> +
>>>> + /*
>>>> + * Read memory from newly added page that was just written to,
>>>> + * confirming that data previously written (MAGIC) is present.
>>>> + */
>>>> + get_addr_op.value = 0;
>>>> + get_addr_op.addr = (unsigned long)addr;
>>>> + get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
>>>> +
>>>> + EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
>>>> +
>>>> + EXPECT_EQ(get_addr_op.value, MAGIC);
>>>> + EXPECT_EEXIT(&self->run);
>>>> + EXPECT_EQ(self->run.exception_vector, 0);
>>>> + EXPECT_EQ(self->run.exception_error_code, 0);
>>>> + EXPECT_EQ(self->run.exception_addr, 0);
>>>> +
>>>> + munmap(addr, edmm_size);
>>>> +}
>>>> +
>>>> /*
>>>> * SGX2 page type modification test in two phases:
>>>> * Phase 1:
>>>> diff --git a/tools/testing/selftests/sgx/main.h
>>>> b/tools/testing/selftests/sgx/main.h
>>>> index fc585be97e2f..fe5d39ac0e1e 100644
>>>> --- a/tools/testing/selftests/sgx/main.h
>>>> +++ b/tools/testing/selftests/sgx/main.h
>>>> @@ -35,7 +35,8 @@ extern unsigned char sign_key[]; extern unsigned
>>>> char sign_key_end[];
>>>>
>>>> void encl_delete(struct encl *ctx); -bool encl_load(const char
>>>> *path, struct encl *encl, unsigned long heap_size);
>>>> +bool encl_load(const char *path, struct encl *encl, unsigned long
>> heap_size,
>>>> + unsigned long edmm_size);
>>>> bool encl_measure(struct encl *encl); bool encl_build(struct encl
>>>> *encl); uint64_t encl_get_entry(struct encl *encl, const char
>>>> *symbol); diff --git a/tools/testing/selftests/sgx/sigstruct.c
>>>> b/tools/testing/selftests/sgx/sigstruct.c
>>>> index 50c5ab1aa6fa..6000cf0e4975 100644
>>>> --- a/tools/testing/selftests/sgx/sigstruct.c
>>>> +++ b/tools/testing/selftests/sgx/sigstruct.c
>>>> @@ -343,7 +343,7 @@ bool encl_measure(struct encl *encl)
>>>> if (!ctx)
>>>> goto err;
>>>>
>>>> - if (!mrenclave_ecreate(ctx, encl->src_size))
>>>> + if (!mrenclave_ecreate(ctx, encl->encl_size))
>>>> goto err;
>>>>
>>>> for (i = 0; i < encl->nr_segments; i++) {
>>>
>>>
>>> Looking at mrenclave_ecreate() the above snippet seems separate from
>>> this test and incomplete since it now obtains encl->encl_size but
>>> continues to compute it again internally. Should this be a separate fix?
>>
>> I would remove this part completely but this also needs comment from Vijay.
>
> If we restrict the large enclave size just for this test, then the above change can be reverted. Calling ` mrenclave_ecreate` with src_size esults in EINIT failure and I think the reason is because of incorrect MRenclave.
>From what I understand this change is needed since the enclave size is no longer just
the size of all the segments at enclave creation. I think it is incomplete though since it
still recomputes the enclave size even though it is now provided as parameter.
This change does not need to be part of this test addition.
Reinette