Re: BUG in drivers/dma/ioat/dma_v2.c:314

From: Dan Williams
Date: Mon Jun 28 2010 - 20:45:33 EST


On 6/28/2010 4:50 PM, Chris Li wrote:
Hi Dan,

My Mac Pro hit this BUG every time it try to load module ioatdma.

This was first discover in FC 12& 13 kernel. See redhat bug 605845.
https://bugzilla.redhat.com/show_bug.cgi?id=605845. I attach a picture
of the kernel panic on the bug.

The current git tree has it as well. The bug line number change a
little bit though.


/* when halted due to errors check for channel
* programming errors before advancing the completion state
*/
if (is_ioat_halted(status)) {
u32 chanerr;

chanerr = readl(chan->reg_base + IOAT_CHANERR_OFFSET);
dev_err(to_dev(chan), "%s: Channel halted (%x)\n",
__func__, chanerr);
BUG_ON(is_ioat_bug(chanerr));<---------------------------------
}

The machine is a Mac Pro. The bug is reproducible 100%. Black list the
ioatdma module and the kernel boot just fine.

Any suggestion? I am not afraid to try out patches.


Looks like that dev_err() did not make it to the console. The attached patch should get us some more debug information. This will stop the driver from making forward progress (applies to current -git). I suspect this may be triggering from the driver self test, but to be safe you should set CONFIG_NET_DMA=n and CONFIG_ASYNC_TX_DMA=n.

--
Dan

diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c
index 3c8b32a..89bff46 100644
--- a/drivers/dma/ioat/dma_v2.c
+++ b/drivers/dma/ioat/dma_v2.c
@@ -285,9 +285,9 @@ void ioat2_timer_event(unsigned long data)
u32 chanerr;

chanerr = readl(chan->reg_base + IOAT_CHANERR_OFFSET);
- dev_err(to_dev(chan), "%s: Channel halted (%x)\n",
- __func__, chanerr);
- BUG_ON(is_ioat_bug(chanerr));
+ WARN_ONCE(is_ioat_bug(chanerr), "%s: %s: Channel halted (%x)\n",
+ dev_name(to_dev(chan)), __func__, chanerr);
+ return;
}

/* if we haven't made progress and we have already