Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
From: Haitao Huang
Date:  Wed Aug 31 2022 - 11:18:08 EST
Hi Kai
On Tue, 30 Aug 2022 22:17:08 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote:
On Wed, 2022-08-31 at 05:57 +0300, jarkko@xxxxxxxxxx wrote:
On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 05:44 +0300, jarkko@xxxxxxxxxx wrote:
> > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@xxxxxxxxxx wrote:
> > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > Hi Jarkko,
> > > > > >
> > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > In sgx_init(), if misc_register() for the provision  
device fails, and
> > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then  
ksgxd will be
> > > > > > > prematurely stopped.
> > > > > >
> > > > > > I do not think misc_register() is required to fail for the  
scenario to
> > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > "In sgx_init(), if a failure is encountered after ksgxd is  
started
> > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > >
> > > > > IMHO "a failure" might be too vague.  For instance, failure  
to sgx_drv_init()
> > > > > won't immediately result in ksgxd to stop prematurally.  As  
long as KVM SGX can
> > > > > be initialized successfully, sgx_init() still returns 0.
> > > > >
> > > > > Btw I was thinking whether we should move  
sgx_page_reclaimer_init() to the end
> > > > > of sgx_init(), after we make sure at least one of the driver  
and the KVM SGX is
> > > > > initialized successfully.  Then the code change in this patch  
won't be necessary
> > > > > if I understand correctly.  AFAICT there's no good reason to  
start the ksgxd at
> > > > > early stage before we are sure either the driver or KVM SGX  
will work.
> > > >
> > > > I would focus fixing the existing flow rather than reinventing  
the flow.
> > > >
> > > > It can be made to work, and therefore it is IMHO correct action  
to take.
> > >
> > > From another perspective, the *existing flow* is the reason which  
causes this
> > > bug.  A real fix is to fix the flow itself.
> >
> > Any existing flow in part of the kernel can have a bug. That
> > does not mean that switching flow would be proper way to fix
> > a bug.
> >
> > BR, Jarkko
>
> Yes but I think this is only true when the flow is reasonable.  If  
the flow
> itself isn't reasonable, we should fix the flow (given it's easy to  
fix AFAICT).
>
> Anyway, let us also hear from others.
The flow can be made to work without issues, which in the
context of a bug fix is exactly what a bug fix should do.
Not more or less.
No. To me the flow itself is buggy.  There's no reason to start ksgxd()  
before
at least SGX driver is initialized to work.
Will it cause racing if we expose dev nodes to user space before
ksgxd is started and sensitization done?
Patching the buggy flow is more like a workaround, but isn't a real fix.
You don't gain any measurable value for the user with this
switch idea.
There is actual gain by moving sgx_page_reclaimer_init() to  
sgx_drv_init(), or
only calling sgx_page_reclaimer_init() when sgx_drv_init() returns  
success:
If somehow sgx_drv_init() fails to initialize, ksgxd() won't run.
Currently, if SGX driver fails to initialize but virtual EPC initializes
successfully, ksgxd() still runs. However it achieves nothing but only  
wastes
CPU cycles.
You still need ksgxd for sanitizing (at least) and swapping (potentially)
even if only virtual EPC initializes.
Thanks
Haitao