Re: [PATCH] x86/sgx: Silence softlockup detection when releasing large enclaves

From: Reinette Chatre
Date: Thu Jan 20 2022 - 11:28:47 EST


Hi Jarkko,

On 1/20/2022 5:01 AM, Jarkko Sakkinen wrote:
> On Tue, 2022-01-18 at 11:14 -0800, Reinette Chatre wrote:
>> Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
>> triggers the softlockup detector.
>>
>> Actual SGX systems have 128GB of enclave memory or more.  The
>> "unclobbered_vdso_oversubscribed" selftest creates one enclave which
>> consumes all of the enclave memory on the system. Tearing down such a
>> large enclave takes around a minute, most of it in the loop where
>> the EREMOVE instruction is applied to each individual 4k enclave
>> page.
>>
>> Spending one minute in a loop triggers the softlockup detector.
>>
>> Add a cond_resched() to give other tasks a chance to run and placate
>> the softlockup detector.
>>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
>> Reported-by: Vijay Dhanraj <vijay.dhanraj@xxxxxxxxx>
>> Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>> Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
>> ---
>> Softlockup message:
>> watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [test_sgx:11502]
>> Kernel panic - not syncing: softlockup: hung tasks
>> <snip>
>> sgx_encl_release+0x86/0x1c0
>> sgx_release+0x11c/0x130
>> __fput+0xb0/0x280
>> ____fput+0xe/0x10
>> task_work_run+0x6c/0xc0
>> exit_to_user_mode_prepare+0x1eb/0x1f0
>> syscall_exit_to_user_mode+0x1d/0x50
>> do_syscall_64+0x46/0xb0
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>>  arch/x86/kernel/cpu/sgx/encl.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c
>> b/arch/x86/kernel/cpu/sgx/encl.c
>> index 001808e3901c..ab2b79327a8a 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -410,6 +410,7 @@ void sgx_encl_release(struct kref *ref)
>>                 }
>>  
>>                 kfree(entry);
>> +               cond_resched();
>>         }
>>  
>>         xa_destroy(&encl->page_array);
>
> I'd add a comment, e.g.
>
> /* Invoke scheduler to prevent soft lockups. */

I could do that. I would like to point out though that there are already
six other usages of cond_resched() in the driver and it does indeed
seem to be the common pattern. When adding this comment to the now
seventh usage it would be the first comment documenting the usage of
cond_resched() in the driver.

>
> Other than that makes sense.

Thank you very much for taking a look.

Reinette