Re: [PATCH] dma-debug: New interfaces to debug dma mapping errors

From: Shuah Khan
Date: Tue Sep 18 2012 - 16:34:56 EST


On Tue, 2012-09-18 at 15:45 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 18, 2012 at 01:42:49PM -0600, Shuah Khan wrote:
> > On Tue, 2012-09-18 at 15:34 +0200, Joerg Roedel wrote:
> > > On Mon, Sep 17, 2012 at 04:45:15PM -0600, Shuah Khan wrote:
> > > > Yeah. I will firm up my ideas a bit and summarize in a day or two. Would
> > > > like to hear your ideas as well at that time, so we can pick the one
> > > > that works the best.
> > >
> > > I think the best approach for this functionality is to add a flag to
> > > 'struct dma_debug_entry' which tells whether the address has been
> > > checked with dma_mapping error or not. On unmap or driver unload you can
> > > then check for that flag and print a warning when an unchecked address
> > > is detected.
> >
> > Was hoping to get comments from you as well. You are original author for
> > this dam-debug module.
> >
> > Are you ok with the system wide and per device error counts I added? Any
> > comments on the overall approach?
> >
> > The approach you suggested will cover the cases where drivers fail to
> > check good map cases. We won't able to catch failed maps that get used
> > without checks. Are you not concerned about these cases? These could
> > cause a silent error with wild writes or could bring the system down. Or
> > are you recommending changing the infrastructure to track failed maps as
> > well?
> >
> > I am still pursuing a way to track failed map cases. I combined the flag
> > idea with one of the ideas I am looking into. Details below: (if this
> > sounds like a reasonable approach, I can do v2 patch and we can discuss
> > the code)
> >
> > . Add new fields dma_map_errors, dma_map_errors_not_checked,
> > dma_unmap_errors, iotlb_overflow_cnt, and flag to struct
> > dma_debug_entry. Maybe flag is not even needed if
> > dma_map_errors_not_checked can double as status.
>
> Not sure if you need the iotlb_overflow_cnt anymore. Just having
> dma_map_errors_not_checked and the dma_map_errors
> (which you can increment/decrement) would suffice. Unless you
> were thinking to check that dma_map_errors == dma_unmap_errors and
> if they != then produce a warning?

Right. I wsn't thinking about that, but I get it. Don't need
iotlb_overflow_cnt as it is included in the failed map count. What I
meant was dma_map_errors_not_checked > 0 is same as the status this flag
is intended to track can be a trigger for warn. But that is not going
work because it will generate warnings as soon as
dma_map_errors_not_checked becomes > 0 and stays that way. Need the
flag. :) So dropping iotlb_overflow_cnt and keeping the status flag.

-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/