Re: PROBLEM REMAINS: [sata_nv ADMA breaks ATAPI] Crash on accessingDVD-RAM

From: Robert Hancock
Date: Sat Jan 12 2008 - 20:39:04 EST


James Bottomley wrote:
On Sat, 2008-01-12 at 17:04 -0600, Robert Hancock wrote:
I don't think the problem is that there's some buffer which is getting allocated above 4GB and never bounced, since the problem goes away if ADMA is disabled entirely and the DMA mask remains 32-bit always. My guess is something is basing its decision on whether to bounce or not on the device DMA mask. That can't possibly work properly for sata_nv since the same PCI device has 2 ports, one of which can be in ADMA mode and 64-bit capable and the other can be in legacy mode and only 32-bit capable.

Erm, well, you can't decouple them. Having a differing blk queue bounce
mask and device mask is going to cause huge trouble. The reason is this
insidious nasty called swiotlb. Basically, with it enabled (and again,
it can be on ia64 or x86_64), the kernel can bypass the bounce limit
safe in the knowledge that swiotlb will fix up behind in the dma_map_
Unfortunately, if the device mask doesn't match the queue mask then
swiotlb will never kick in and you'll end up with mapped pages beyond
the 4GB limit.

Yuck.. All the IOMMU DMA mapping code checks against the device DMA mask, so it looks like if we get to the point of doing the DMA mapping on >4GB addresses in libata we're screwed with this approach.

The key problem is that both ata_ports share the same struct device with one DMA mask which really doesn't match what this controller wants. I wonder if we could do a different struct device for each port?

Other than that, I guess the solutions would be to just set a 32-bit mask on the device if either port has an ATAPI device connected (which is fairly ugly, considering that you could do things like hotplug an ATAPI device when the other port was in use, for example), or do something to prevent requests from reaching this point with >4GB addresses in the first place..


Tejun, I believe you had a patch that was printing warnings when libata tried to program a legacy PRD with an address over 4GB. Could we change that to WARN_ON and get someone experiencing this to try it and
see what the stack trace points to?

Unfortunately, the stack trace probably won't help, since the command
likely gets issued from the block request function, so the trace won't
go back to the culpable initiator; that's why the command would be
helpful.

Well, dumping the ATA command surely isn't helpful, as I'm sure it will be PACKET. I guess we'd have to dump out the actual CDB..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/