Re: [PATCH] PCI Error Recovery: e100 network device driver

From: Zhang, Yanmin
Date: Fri Apr 28 2006 - 22:51:18 EST


On Fri, 2006-04-07 at 06:24, Linas Vepstas wrote:
> Please apply and forward upstream.
>
> --linas
>
> [PATCH] PCI Error Recovery: e100 network device driver
>
> Various PCI bus errors can be signaled by newer PCI controllers. This
> patch adds the PCI error recovery callbacks to the intel ethernet e100
> device driver. The patch has been tested, and appears to work well.
>
> Signed-off-by: Linas Vepstas <linas@xxxxxxxxx>
> Acked-by: Jesse Brandeburg <jesse.brandeburg@xxxxxxxxx>
I am enabling PCI-Express AER (Advanced Error Reporting) in kernel and
glad to see many drivers to support pci error handling.


>
> ----
>
> drivers/net/e100.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 65 insertions(+)
>
> Index: linux-2.6.17-rc1/drivers/net/e100.c
> ===================================================================
> --- linux-2.6.17-rc1.orig/drivers/net/e100.c 2006-04-05 09:56:06.000000000 -0500
> +++ linux-2.6.17-rc1/drivers/net/e100.c 2006-04-06 15:17:29.000000000 -0500
> @@ -2781,6 +2781,70 @@ static void e100_shutdown(struct pci_dev
> }
>
>
> +/* ------------------ PCI Error Recovery infrastructure -------------- */
> +/** e100_io_error_detected() is called when PCI error is detected */
> +static pci_ers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> +{
> + struct net_device *netdev = pci_get_drvdata(pdev);
> +
> + /* Same as calling e100_down(netdev_priv(netdev)), but generic */
> + netdev->stop(netdev);
e100 stop method e100_close calls e100_down which would do IO. Does it
violate the rule defined in Documentation/pci-error-recovery.txt that
error_detected shouldn't do any IO?
Suggest to create a new function, such like e100_close_noreset.


> +
> + /* Detach; put netif into state similar to hotplug unplug */
> + netif_poll_enable(netdev);
> + netif_device_detach(netdev);
> +
> + /* Request a slot reset. */
> + return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +/** e100_io_slot_reset is called after the pci bus has been reset.
> + * Restart the card from scratch. */
> +static pci_ers_result_t e100_io_slot_reset(struct pci_dev *pdev)
> +{
> + struct net_device *netdev = pci_get_drvdata(pdev);
> + struct nic *nic = netdev_priv(netdev);
> +
> + if(pci_enable_device(pdev)) {
> + printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n");
> + return PCI_ERS_RESULT_DISCONNECT;
> + }
> + pci_set_master(pdev);
> +
> + /* Only one device per card can do a reset */
> + if (0 != PCI_FUNC (pdev->devfn))
> + return PCI_ERS_RESULT_RECOVERED;
> + e100_hw_reset(nic);
> + e100_phy_init(nic);
Should pci_set_master be called after e100_hw_reset like in function
e100_probe?


> +
> + return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +/** e100_io_resume is called when the error recovery driver
> + * tells us that its OK to resume normal operation.
> + */
> +static void e100_io_resume(struct pci_dev *pdev)
> +{
> + struct net_device *netdev = pci_get_drvdata(pdev);
> + struct nic *nic = netdev_priv(netdev);
> +
> + /* ack any pending wake events, disable PME */
> + pci_enable_wake(pdev, 0, 0);
> +
> + netif_device_attach(netdev);
> + if(netif_running(netdev)) {
> + e100_open (netdev);
> + mod_timer(&nic->watchdog, jiffies);
e100_open calls e100_up which already sets watchdog timer. Why to set
it again?

> + }
> +}
> +
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/