Re: Subject : [ PATCH ]pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

From: Bjorn Helgaas
Date: Mon May 20 2013 - 18:48:35 EST


On Fri, Apr 26, 2013 at 06:28:59AM +0000, Zhang, LongX wrote:
> From: Zhang Long <longx.zhang@xxxxxxxxx>
>
> Specific pci device drivers might have many functions to call
> pci_channel_offline to check device states. When slot_reset happens,
> drivers' slot_reset callback might call such functions and eventually
> abort the reset.
>
> The patch resets pdev->error_state to pci_channel_io_normal at
> the begining of report_slot_reset.
>
> Thank Liu Joseph for pointing it out.
>
> Signed-off-by: Zhang Yanmin <yanmin_zhang@xxxxxxxxxxxxxxx>
> Signed-off-by: Zhang Long <longx.zhang@xxxxxxxxx>
> ---
> drivers/pci/pcie/aer/aerdrv_core.c | 1 +
> drivers/pci/pcie/portdrv_pci.c | 12 +++++-------
> 2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 564d97f..c61fd44 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -286,6 +286,7 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
> result_data = (struct aer_broadcast_data *) data;
>
> device_lock(&dev->dev);
> + dev->error_state = pci_channel_io_normal;
> if (!dev->driver ||
> !dev->driver->err_handler ||
> !dev->driver->err_handler->slot_reset)
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index ed4d094..7abefd9 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -332,13 +332,11 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
> pci_ers_result_t status = PCI_ERS_RESULT_RECOVERED;
> int retval;
>
> - /* If fatal, restore cfg space for possible link reset at upstream */
> - if (dev->error_state == pci_channel_io_frozen) {
> - dev->state_saved = true;
> - pci_restore_state(dev);
> - pcie_portdrv_restore_config(dev);
> - pci_enable_pcie_error_reporting(dev);
> - }
> + /* restore cfg space for possible link reset at upstream */
> + dev->state_saved = true;
> + pci_restore_state(dev);
> + pcie_portdrv_restore_config(dev);
> + pci_enable_pcie_error_reporting(dev);
>
> /* get true return value from &status */
> retval = device_for_each_child(&dev->dev, &status, slot_reset_iter);

I think this patch changes the behavior in the case of a non-fatal error
where one of the .error_detected() methods returned
PCI_ERS_RESULT_NEED_RESET. In that case, pcie_portdrv_slot_reset()
previously did not restore config space, but after your patch, it *will*
restore it. We need an explanation of why this is safe.

I think you should split this into two patches: the first would remove the
"if (dev->error_state == pci_channel_io_frozen)" test from portdrv_pci.c
and explain the reason, and the second would make the aerdrv_core.c change.

I'm also concerned that in that same case (a non-fatal error where one of
the .error_detected() methods returned PCI_ERS_RESULT_NEED_RESET), I don't
think we actually *do* any kind of device reset. This isn't related to
your patch, of course, so if you resolve the config space restore question,
we can deal with the reset question later.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/