Re: [PATCH] ixgb: add PCI Error recovery callbacks

From: Linas Vepstas
Date: Thu Jul 06 2006 - 12:15:09 EST


On Thu, Jul 06, 2006 at 09:21:39AM +0800, Zhang, Yanmin wrote:
> On Thu, 2006-07-06 at 03:44, Linas Vepstas wrote:
> > On Wed, Jul 05, 2006 at 08:49:27AM -0700, Auke Kok wrote:
> > > Zhang, Yanmin wrote:
> > > >On Fri, 2006-06-30 at 00:26, Linas Vepstas wrote:
> > > >>Adds PCI Error recovery callbacks to the Intel 10-gigabit ethernet
> > > >>ixgb device driver. Lightly tested, works.
> > > >
> > > >Both pci_disable_device and ixgb_down would access the device. It doesn't
> > > >follow Documentation/pci-error-recovery.txt that error_detected shouldn't
> > > >do
> > > >any access to the device.
> > >
> > > Moreover, it was Linas who wrote this documentation in the first place :)
> >
> > On the pSeries, its harmless to try to do i/o; the i/o will e blocked.
> In the future, we might move the pci error recovery codes to generic to
> support other platforms which might not block I/O. So it's better to follow
> Documentation/pci-error-recovery.txt when adding error recovery codes into driver.

Or we could change the documentation. The point was that doing
unexpected i/o after the aapter reset is likely to wedge the adapter
again, leading to an inf loop of resets. As a practical matter,
I found that, while developing this patch, and the other related
patches, that this was indeed the usual failure mode: incorrect bringup
just lead to more errors.

What I really want to do is to perform as clean a shut-down as possible,
reset the adapter, and then bring it back up. I'm concerned that changing
the order to "reset"-"shutdown-"bringup" would be inappropriate.

Perhaps the right fix is to figure out what parts of the driver do i/o
during shutdown, and then add a line "if(wedged) skip i/o;" to those
places?

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/