Re: [PATCH 2/4] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs

From: Guorui Yu
Date: Mon Jan 30 2023 - 08:45:31 EST


Hi Andi,

在 2023/1/30 14:46, Andi Kleen 写道:

I try to solve this problem by creating a new kernel thread, "kccd", to populate the TLB buffer in the backgroud.

Specifically,
1. A new kernel thread is created with the help of "arch_initcall", and this kthread is responsible for memory allocation and setting memory attributes (private or shared);
2. The "swiotlb_tbl_map_single" routine only use the spin_lock protected TLB buffers pre-allocated by the kthread;
  a) which actually includes ONE memory allocation brought by xarray insertion "__xa_insert__".

That already seems dangerous with all the usual problems of memory allocations in IO paths. Normally code at least uses a mempool to avoid the worst dead lock potential.


The __xa_insert__ is called with GFP_NOWAIT (GFP_ATOMIC & ~__GFP_HIGH), and I will try to dig into this to check if there is any chance to have the deadlock.

I also tried my best to test this piece of code, and no issue have been found in the case of a maximum of 700,000 IOPS.

Thanks for your advices from this point, since I have not notice such possibility.

3. After each allocation, the water level of TLB resources will be checked. If the current TLB resources are found to be lower than the preset value (half of the watermark), the kthread will be awakened to fill them.
4. The TLB buffer allocation in the kthread is batched to "(MAX_ORDER_NR_PAGES << PAGE_SHIFT)" to reduce the holding time of spin_lock and number of calls to set_memory_decrypted().

Okay, but does this guarantee that it will never run out of memory?

It seems difficult to make such guarantees. What happens for example if the background thread gets starved by something higher priority?

No, this cannot guarantee we always have sufficient TLB caches, so we can also have a "No memory for cc-swiotlb buffer" warning.

But I want to emphasize that in this case, the current implementation is no worse than the legacy implementation. Moreover, dynamic TLB allocation is more suitable for situations where more disks/network devices will be hotplugged, in which case you cannot pre-set a reasonable value.

Or if the allocators have such high bandwidth that they can overwhelm any reasonable background thread.

-Andi


Sincerely,
Guorui