Re: [PATCH] DMA: PL330: Fix racy mutex unlock

From: Javi Merino
Date: Wed Jun 13 2012 - 14:36:49 EST


On Wed 13 Jun 2012 16:13:05 BST, Jassi Brar wrote:
On 13 June 2012 19:37, Javi Merino<javi.merino@xxxxxxx> wrote:
pl330_update() stores a pointer to the thrd->req that finished, which
contains a pointer to the corresponding pl330_req. This is done with
the pl330_lock held. Then, it iterates through the req_done list,
calling the callback for each of the requests that are done. The
problem is that the driver releases the lock before calling the
callback for each of the callbacks. pl330_submit_req() running in
another processor can then acquire the lock and insert another request
in one of the thrd->req that hasn't been processed yet, replacing the
pointer to pl330_req there. When the callback returns in
pl330_update() and the next rqdone is popped from the list, it
dereferences the pl330_req pointer to the just scheduled pl330_req,
instead of the one that has finished, calling pl330 with the wrong r.

This patch fixes this by storing the pointer to pl330_req directly in
the list.

.....
@@ -1683,7 +1683,7 @@ static void pl330_dotask(unsigned long data)
/* Returns 1 if state was updated, 0 otherwise */
static int pl330_update(const struct pl330_info *pi)
{
- struct _pl330_req *rqdone;
+ struct pl330_req *rqdone, *tmp;
struct pl330_dmac *pl330;
unsigned long flags;
void __iomem *regs;
@@ -1750,7 +1750,10 @@ static int pl330_update(const struct pl330_info *pi)
if (active == -1) /* Aborted */
continue;

- rqdone =&thrd->req[active];
+ /* Detach the req */
+ rqdone = thrd->req[active].r;
+ thrd->req[active].r = NULL;
+
Doesn't this movement of "Detach the req" chunk effectively remain the
same? Since that was already protected by the same lock. I thought I
deliberately took care of that already.

Yes, but you release the lock before calling the first callback, so the subsequent dereferences of rqdone->r are not protected by the lock.

Do you see some real problem fixed by this patch? Info about that
could help me better understand if I missed something here.

Ok, the description sucks. Let me try to describe it with the scenario that failed:

Core 0:
- Two DMA transactions finish, in channels 0 and 1.
- pl330_update() is called, the "Event-Interrupt Status Register" (ES)
is 0x3.
- In the "for (ev = 0;..." loop
+ two pointers are stored in pl330->req_done:
pl330->channels[0]->req[0] and pl330->channels[1]->req[0]
- In the "while (!list_empty.." loop,
+ r = pl330->channels[0]->req[0]->r
+ Release the pl330_lock and call _callback()

Core 1:
- pl330_submit_req() for channel 1
- Grab the pl330_lock
- Hook a request in pl330->channels[1]->req[0]->r
- Release the pl330_lock

Core 0:
- _callback() returns
- Acquire the pl330_lock again
- second iteration of "while (!list_empty.." loop,
+ r = pl330->channels[1]->req[0]->r , but you get the r that has just
been scheduled, not the one that finished.

Hope it's now clearer,
Javi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/