Re: sata_sil24 0000:04:00.0: DMA-API: device driver frees DMA sglist with different entry count [map count=13] [unmap count=10]

From: FUJITA Tomonori
Date: Thu Jun 04 2009 - 18:43:28 EST


On Thu, 4 Jun 2009 20:07:36 +0200
Torsten Kaiser <just.for.lkml@xxxxxxxxxxxxxx> wrote:

> On Thu, Jun 4, 2009 at 9:53 AM, Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
> > On Thu, Jun 04 2009, FUJITA Tomonori wrote:
> >> On Thu, 04 Jun 2009 10:15:14 +0300
> >> Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote:
> >>
> >> > On 06/04/2009 09:33 AM, FUJITA Tomonori wrote:
> >> > > On Thu, 4 Jun 2009 08:12:34 +0200
> >> > > Torsten Kaiser <just.for.lkml@xxxxxxxxxxxxxx> wrote:
> >> > >
> >> > >> On Thu, Jun 4, 2009 at 2:02 AM, FUJITA Tomonori
> >> > >> <fujita.tomonori@xxxxxxxxxxxxx> wrote:
> >> > >>> On Wed, 3 Jun 2009 21:30:32 +0200
> >> > >>> Torsten Kaiser <just.for.lkml@xxxxxxxxxxxxxx> wrote:
> >> > >>>> Still happens with 2.6.30-rc8 (see trace at the end of the email)
> >> > >>>>
> >> > >>>> As orig_n_elem is only used two times in libata-core.c I suspected a
> >> > >>>> corruption of the qc->sg, but adding checks for this did not trigger.
> >> > >>>> So I looked into lib/dma-debug.c.
> >> > >>>> It seems add_dma_entry() does not protect against adding the same
> >> > >>>> entry twice.
> >> > >>> Do you mean that add_dma_entry() doesn't protect against adding a new
> >> > >>> entry identical to the existing entry, right?
> >> > >> Yes, as I read the hash bucket code in lib/dma-debug.c a second entry
> >> > >> from the same device and the same address will just be added to the
> >> > >> list and on unmap it will always return the first entry.
> >> > >
> >> > > It means that two different DMA operations will be performed against
> >> > > the same dma addresss on the same device at the same time. It doesn't
> >> > > happen unless there is a bug in a driver, an IOMMU or somewhere, as I
> >> > > wrote in the previous mail.
> >> > >
> >> >
> >> > What about the draining buffers used by libata. Are they not the same buffer
> >> > for all devices for all requests?
> >>
> >> I'm not sure if the drain buffer is used like that. But is there
> >> easier ways to see the same buffer; e.g. sending the same buffer twice
> >> with DIO?
> >
> > I'm pretty sure we discussed this some months ago, the intel iommu
> > driver had a similar bug iirc. Lets say you want to write the same 4kb
> > block to two spots on the disk. You prepare and submit that with
> > O_DIRECT and using aio. On a device with NCQ, that could easily map the
> > same page twice. Or, perhaps more likely, doing 512b writes and not
> > getting all of them merged.
>
> I have a even better theory: RAID1
> There are two disk on this sil24 controller that are uses as an RAID1
> to form my root partition.
>
> That also fits the pattern of the very large number of duplicate dma
> mappings (as each data block needs to be written twice), but that the
> DMA-API debug check only triggers during heavier load: Most of the
> time both drives are in sync and so the write request should be
> idential, so it does not matter which entry gets returned from the
> hash bucket.
> But when I run 'updatedb' to trigger this error the read request
> disturb the pattern and the write requests also become asymetric.
>
> >> As I wrote, I assume that he uses GART IOMMU;
>
> [ 0.010000] Checking aperture...
> [ 0.010000] No AGP bridge found
> [ 0.010000] Node 0: aperture @ a7f0000000 size 32 MB
> [ 0.010000] Aperture beyond 4GB. Ignoring.
> [ 0.010000] Your BIOS doesn't leave a aperture memory hole
> [ 0.010000] Please enable the IOMMU option in the BIOS setup
> (sadly my BIOS does not have such an option...)
> [ 0.010000] This costs you 64 MB of RAM
> [ 0.010000] Mapping aperture over 65536 KB of RAM @ 20000000
> [ 0.010000] Memory: 4057512k/4718592k available (4674k kernel code,
> 524868k absent, 136212k reserved
> , 2520k data, 1172k init)
> [snip]
> [ 1.304386] DMA-API: preallocated 32768 debug entries
> [ 1.309439] DMA-API: debugging enabled by kernel config
> [ 1.310123] PCI-DMA: Disabling AGP.
> [ 1.313711] PCI-DMA: aperture base @ 20000000 size 65536 KB
> [ 1.320002] PCI-DMA: using GART IOMMU.
> [ 1.323763] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
> [ 1.330640] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
> [ 1.340007] hpet0: 3 comparators, 32-bit 25.000000 MHz counter

You use GART IOMMU. So I thought that you shouldn't hit this problem
because an IOMMU gives an unique dma address per dma mapping... but I
forgot one really important thing about GART, it's not real IOMMU
hardware. It does address remapping only when necessary (an address
can be accessed by a device). It's possible that you see multiple DMA
transfers performed against the same dma address on one device at the
same time.


> >> it allocates an unique
> >> dma address per dma mapping operation.
> >>
> >> However, dma-debug is broken wrt this, I guess.
> >
> > Seems so.
>
> Yes, as the md code for RAID1 has a very good cause to send the same
> memory page twice to this device.

Yeah, now it's clear for me why you hit this bug.

I'm not sure there is any simple way to fix dma-debug wrt this. I
think that it's better to just disable it since 2.6.30 will be
released shortly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/