Re: [PATCH v13 12/14] x86/sgx: Turn on per-cgroup EPC reclamation

From: Haitao Huang
Date: Mon May 06 2024 - 23:21:57 EST


On Mon, 06 May 2024 19:10:42 -0500, Huang, Kai <kai.huang@xxxxxxxxx> wrote:



On 1/05/2024 7:51 am, Haitao Huang wrote:
static void sgx_reclaim_pages_global(struct mm_struct *charge_mm)
{
- sgx_reclaim_pages(&sgx_global_lru, charge_mm);
+ if (IS_ENABLED(CONFIG_CGROUP_MISC))
+ sgx_cgroup_reclaim_pages(misc_cg_root(), charge_mm);
+ else
+ sgx_reclaim_pages(&sgx_global_lru, charge_mm);
}


I think we have a problem here when we do global reclaim starting from the ROOT cgroup:

This function will mostly just only try to reclaim from the ROOT cgroup, but won't reclaim from the descendants.

The reason is the sgx_cgroup_reclaim_pages() will simply return after "scanning" SGX_NR_TO_SCAN (16) pages w/o going to the descendants, and the "scanning" here simply means "removing the EPC page from the cgroup's LRU list".

So as long as the ROOT cgroup LRU contains more than SGX_NR_TO_SCAN (16) pages, effectively sgx_cgroup_reclaim_pages() will just scan and return w/o going into the descendants. Having 16 EPC pages should be a "almost always true" case I suppose.

When the sgx_reclaim_pages_global() is called again, we will start from the ROOT again.

That means the this doesn't truly reclaim "from global" at all.

IMHO the behaviour of sgx_cgroup_reclaim_pages() is OK for per-cgroup reclaim because I think in this case our intention is we should try best to reclaim from the cgroup, i.e., whether we can reclaim from descendants doesn't matter.

But for global reclaim this doesn't work.

Am I missing anything?

Good catch. This is indeed a problem if pages in a higher level cgroup are always busy (being 'young').The reclamation loop starting from this group may be stuck in only shifting the pages from front to tail in this group and never tries to scan & reclaim pages in its descendants.

Though this may not happen often, I think it does require a fix. Will do it in v14 :-)

Thanks
Haitao