Re: [PATCH] Prevent nested interrupts when the IRQ stack is nearoverflowing v2

From: Thomas Gleixner
Date: Thu Mar 25 2010 - 10:17:02 EST


On Thu, 25 Mar 2010, Andi Kleen wrote:
> > > > > Anyways if such a thing was done it would be a long term project
> > > > > and that short term fix would be still needed.
> > > >
> > > > Your patch is not a fix, It's a lousy, horrible and unreliable
> > > > workaround. It's not fixing the root cause of the problem at hand.
> > >
> > > It fixes the bug in a minimally intrusive way.
> >
> > It papers over the problem. We already know that the NIC driver floods
> > the machine with interrupts, so why are you insisting that we need to
>
> Well in this case it's simply because it has 4 ports and they are all
> active and have a lot of MSI-X vectors for each stream.
>
> Even if you had the perfect interrupt handler that ran in
> one cycle, if you had enough of them in parallel from different ports
> there could be still a stack overflow problem on individual CPUs.

Not at all if the handler runs with irqs disabled.

> > The minimal intrusive way is a one liner in that very driver code and
> > if it causes problems for that very driver then we don't fix them with
> > adding a callback in the generic interrupt code path.
>
> Ok.
>
> >
> > The message which we would send out with applying that band aid would
> > be simply: Go ahead driver writers and let your handlers run as long
>
> Well it's simply the current state of affairs today. I'm merely
> attempting to make the current state slightly safer without breaking
> anything in the process.

Well, I'd agree if those stack overflows would be a massive reported
problem.

Right now they happen with a weird test case which points out a
trouble spot. Multi vector NICs under heavy load. So why not go there
and change the handful of drivers to run their handlers with irqs
disabled?

Band aids are the last resort if we can't deal with a problem by other
sane means. And this problem falls not into that category, it can be
solved in the affected drivers with zero effort.

Thanks,

tglx



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/