Re: [BUG] Circular locking dependency - DRM/CMA/MM/hotplug/...

From: Laura Abbott
Date: Fri Mar 21 2014 - 14:47:43 EST


On 2/18/2014 9:44 AM, Michal Nazarewicz wrote:
>> On 2014-02-12 17:33, Russell King - ARM Linux wrote:
>>> What if we did these changes:
>>>
>>> struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>>> unsigned int align)
>>> {
>>> ...
>>> mutex_lock(&cma_mutex);
>>> ...
>>> for (;;) {
>>> pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count,
>>> start, count, mask);
>>> if (pageno >= cma->count)
>>> break;
>>>
>>> pfn = cma->base_pfn + pageno;
>>> + bitmap_set(cma->bitmap, pageno, count);
>>> + mutex_unlock(&cma_mutex);
>>> ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
>>> + mutex_lock(&cma_mutex);
>>> if (ret == 0) {
>>> - bitmap_set(cma->bitmap, pageno, count);
>>> page = pfn_to_page(pfn);
>>> break;
>>> - } else if (ret != -EBUSY) {
>>> + }
>>> + bitmap_clear(cma->bitmap, pageno, count);
>>> + if (ret != -EBUSY) {
>>> break;
>>> }
>>> ...
>>> mutex_unlock(&cma_mutex);
>>> pr_debug("%s(): returned %p\n", __func__, page);
>>> return page;
>>> }
>
> Like Marek said, this will fail if two concurrent calls to
> alloc_contig_range are made such that they operate on the same pageblock
> (which is possible as the allocated regions do not need to be pageblock
> aligned).
>
> Another mutex could be added just for this one call, but as I understand
> this would not solve the problem.
>
>>> bool dma_release_from_contiguous(struct device *dev, struct page *pages,
>>> int count)
>>> {
>>> ...
>>> + free_contig_range(pfn, count);
>>> mutex_lock(&cma_mutex);
>>> bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>>> - free_contig_range(pfn, count);
>>> mutex_unlock(&cma_mutex);
>>> ...
>>> }
>
> This *should* be fine. Didn't test it.
>

I managed to hit a different deadlock that had a similar root cause.
I also managed to independently come up with a similar solution. This
has been tested somewhat but not in wide distribution.

Thanks,
Laura

----- 8< ------