Re: [PATCH 3/5] dmaengine: plx-dma: Introduce PLX DMA engine PCI driver skeleton

From: Logan Gunthorpe
Date: Tue Nov 12 2019 - 12:22:54 EST




On 2019-11-11 11:09 p.m., Vinod Koul wrote:
> On 11-11-19, 10:50, Logan Gunthorpe wrote:
>>
>>
>> On 2019-11-09 10:35 a.m., Vinod Koul wrote:
>>> On 22-10-19, 15:46, Logan Gunthorpe wrote:
>>>> +static irqreturn_t plx_dma_isr(int irq, void *devid)
>>>> +{
>>>> + return IRQ_HANDLED;
>>>
>>> ??
>>
>> Yes, sorry this is more of an artifact of how I chose to split the
>> patches up. The ISR is filled-in in patch 4.
>
> lets move this code in all including isr registration in patch 4 then :)

Ok, I'll rework that for the next submission.

>>>> + */
>>>> + schedule_work(&plxdev->release_work);
>>>> +}
>>>> +
>>>> +static void plx_dma_put(struct plx_dma_dev *plxdev)
>>>> +{
>>>> + kref_put(&plxdev->ref, plx_dma_release);
>>>> +}
>>>> +
>>>> +static int plx_dma_alloc_chan_resources(struct dma_chan *chan)
>>>> +{
>>>> + struct plx_dma_dev *plxdev = chan_to_plx_dma_dev(chan);
>>>> +
>>>> + kref_get(&plxdev->ref);
>>>
>>> why do you need to do this?
>>
>> This has to do with being able to probably unbind while a channel is in
>> use. If we don't hold a reference to the struct plx_dma_dev between
>> alloc_chan_resources() and free_chan_resources() then it will panic if a
>> call back is called after plx_dma_remove(). The way I've done it, once a
>
> which callback?

Any callback that tries to obtain the free'd plx_dma_dev structure. (So
plx_dma_free_chan_resources(), plx_dma_prep_memcpy(),
plx_dma_issue_pending(), plx_dma_tx_status()).

>> device is removed, subsequent calls to dma_prep_memcpy() will fail (see
>> ring_active).
>>
>> struct plx_dma_dev needs to be alive between plx_dma_probe() and
>> plx_dma_remove(), and between calls to alloc_chan_resources() and
>> free_chan_resources(). So we use a reference count to ensure this.
>
> and that is why we hold module reference so we don't go away without
> cleanup

No, that's wrong. The module reference will prevent the module and the
functions within it from going away. It doesn't prevent the driver from
being unbound which normally causes the devices' structure from being
freed. Most drivers will free the structure containing the DMA engine on
the remove() call, so even if the module is still around, its functions
will still be called with a freed pointer. We're taking a reference on
the pointer to ensure it's not freed while dmaengine users still have a
reference to it.

>>>> +static void plx_dma_release_work(struct work_struct *work)
>>>> +{
>>>> + struct plx_dma_dev *plxdev = container_of(work, struct plx_dma_dev,
>>>> + release_work);
>>>> +
>>>> + dma_async_device_unregister(&plxdev->dma_dev);
>>>> + put_device(plxdev->dma_dev.dev);
>>>> + kfree(plxdev);
>>>> +}
>>>> +
>>>> +static void plx_dma_release(struct kref *ref)
>>>> +{
>>>> + struct plx_dma_dev *plxdev = container_of(ref, struct plx_dma_dev, ref);
>>>> +
>>>> + /*
>>>> + * The dmaengine reference counting and locking is a bit of a
>>>> + * mess so we have to work around it a bit here. We might put
>>>> + * the reference while the dmaengine holds the dma_list_mutex
>>>> + * which means we can't call dma_async_device_unregister() directly
>>>> + * here and it must be delayed.
>>>
>>> why is that, i have not heard any complaints about locking, can you
>>> elaborate on why you need to do this?
>>
>> Per the above explanation, we need to call plx_dma_put() in
>> plx_dma_free_chan_resources(); and plx_dma_release() is when we can call
>> dma_async_device_unregister() (seeing that's when we know there are no
>> longer any active channels).
>>
>> However, dma_chan_put() (which calls device_free_chan_resources()) holds
>> the dma_list_mutex and dma_async_device_unregister() tries to take the
>> dma_list_mutex so, if we call unregister inside free_chan_resources we
>> would deadlock.
>
> yes as we are not expecting someone to unregister in
> device_free_chan_resources(), that is for freeing up resources.
>
> You are expected to unregister in .remove!
>
> Can you explain me why unregister cant be done in remove? I think I am
> still missing some detail for this case.

Because, if the user unbinds while there's a client of the dma channel,
then it panics the kernel. First there's the warning[1] I pointed out
previously, then the DMA channel users will cause a use after free
exception when they continue unaware that the memory they are using has
been freed.

For an example from a random driver:

1) owl_dma_probe() allocates it's struct owl_dma with devm_kzalloc()
2) Another driver calls dma_find_channel() and obtains a reference to
one of the channels
3) Asynchronously, the user unbinds the owl_dma driver using the sysfs
interface
4) owl_dma_remove() is called which calls dma_async_device_unregister()
which produces a WARN_ON because a channel is in use
5) The devm stack for this driver instance unwinds and the struct
owl_dma is freed
6) The client driver then calls dmaengine_prep_dma_memcpy() which calls
owl_dma_prep_memcpy(). The first thing that driver does is convert the
now invalid channel reference to an invalid struct owl_dma reference and
shortly thereafter dereferences the now freed memory. If KASAN is
enabled, the user will get a big use after free bug panic. If not, the
driver will read and write memory that may be used by some other random
process eventually causing other random fatal errors in the system. The
best case scenario is the process that allocates the already freed
memory zeros it, and thus the client driver would panic on a NULL
pointer dereference.

I think this is unacceptable for a driver to have happen and that's why
I wrote the plx driver such that it is not possible. This is especially
important for the PLX driver because it is a PCI device which can be
hotplugged so users may actually be randomly trying to unbind it.

Logan


[1]
https://elixir.bootlin.com/linux/latest/source/drivers/dma/dmaengine.c#L1119