Re: [PATCH v2 1/2] iommu/vt-d: Ratelimit fault handler

From: Alex Williamson
Date: Thu Mar 17 2016 - 16:46:50 EST


On Thu, 17 Mar 2016 13:33:30 -0700
Joe Perches <joe@xxxxxxxxxxx> wrote:

> On Thu, 2016-03-17 at 14:12 -0600, Alex Williamson wrote:
> > Fault rates can easily overwhelm the console and make the system
> > unresponsive.ÂÂRatelimit to allow an opportunity for maintenance.
> []
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> []
> > @@ -1602,10 +1602,17 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
> > Â int reg, fault_index;
> > Â u32 fault_status;
> > Â unsigned long flag;
> > + bool ratelimited;
> > + static DEFINE_RATELIMIT_STATE(rs,
> > + ÂÂÂÂÂÂDEFAULT_RATELIMIT_INTERVAL,
> > + ÂÂÂÂÂÂDEFAULT_RATELIMIT_BURST);
>
> Are these the appropriate limits for dmar?
>
> include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_INTERVALÂÂÂÂ(5 * HZ)
> include/linux/ratelimit.h:#define DEFAULT_RATELIMIT_BURSTÂÂÂÂÂÂÂ10

They seem OK to me, I've got a test running that continuously generates
DMA read faults and I get 20 lines of log every 5 seconds. That seems
like enough to know there's an issue, it's ongoing, and maybe see some
patterns in the fault addresses. I expect we could turn up the burst
value but generally when I'm looking at the logs I'm only looking for
things like is it a single target address, is it a sequential address,
or what's the general address space to know if it should or should not
be a valid fault address. Thanks,

Alex