Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

From: David Rientjes
Date: Sun Dec 15 2019 - 00:38:28 EST


On Thu, 12 Dec 2019, David Rientjes wrote:

> Since all DMA must be unencrypted in this case, what happens if all
> dma_direct_alloc_pages() calls go through the DMA pool in
> kernel/dma/remap.c when force_dma_unencrypted(dev) == true since
> __PAGE_ENC is cleared for these ptes? (Ignoring for a moment that this
> special pool should likely be a separate dma pool.)
>
> I assume a general depletion of that atomic pool so
> DEFAULT_DMA_COHERENT_POOL_SIZE becomes insufficient. I'm not sure what
> size any DMA pool wired up for this specific purpose would need to be
> sized at, so I assume dynamic resizing is required.
>
> It shouldn't be *that* difficult to supplement kernel/dma/remap.c with the
> ability to do background expansion of the atomic pool when nearing its
> capacity for this purpose? I imagine that if we just can't allocate pages
> within the DMA mask that it's the only blocker to dynamic expansion and we
> don't oom kill for lowmem. But perhaps vm.lowmem_reserve_ratio is good
> enough protection?
>
> Beyond that, I'm not sure what sizing would be appropriate if this is to
> be a generic solution in the DMA API for all devices that may require
> unecrypted memory.
>

Optimizations involving lowmem reserve ratio aside, is it ok that
CONFIG_AMD_MEM_ENCRYPT develops a dependency on DMA_DIRECT_REMAP because
set_memory_decrypted() must be allowed to block?

If so, we could allocate from the atomic pool when we can't block and the
device requires unencrypted DMA from dma_direct_alloc_pages(). I assume
we need this to be its own atomic pool specifically for
force_dma_unencrypted() devices and to check addr_in_gen_pool() for this
new unencrypted pool in dma_direct_free_pages().

I have no idea how large this unencrypted atomic pool should be sized. We
could determine a nice default and grow size for nvme itself, but as
Christoph mentioned many drivers require non-blockable allocations that
can be run inside a SEV encrypted guest.

Trivial implementation would be to just double the size of the unencrypted
pool when it reaches half capacity. Perhaps done with GFP_KERNEL |
__GFP_DMA allocations in a workqueue. We can reclaim from ZONE_DMA or
ZONE_DMA32 in this context but when that fails I'm not sure if it's
satisfactory to just fail the dma_pool_alloc() when the unecrypted pool
runs out.

Heuristics can be tweaked, of course, but I want to make sure I'm not
missing anything obvious with this approach before implementing it.
Please let me know, thanks.