Re: [PATCH v6 4/5] PCI: only return true when dev io state is really changed

From: Bjorn Helgaas
Date: Fri Oct 02 2020 - 13:29:17 EST


[+cc Sinan]

On Wed, Sep 30, 2020 at 03:05:36AM -0400, Ethan Zhao wrote:
> When uncorrectable error happens, AER driver and DPC driver interrupt
> handlers likely call
>
> pcie_do_recovery()
> ->pci_walk_bus()
> ->report_frozen_detected()
>
> with pci_channel_io_frozen the same time.
> If pci_dev_set_io_state() return true even if the original state is
> pci_channel_io_frozen, that will cause AER or DPC handler re-enter
> the error detecting and recovery procedure one after another.
> The result is the recovery flow mixed between AER and DPC.
> So simplify the pci_dev_set_io_state() function to only return true
> when dev->error_state is changed.
>
> Signed-off-by: Ethan Zhao <haifeng.zhao@xxxxxxxxx>
> Tested-by: Wen Jin <wen.jin@xxxxxxxxx>
> Tested-by: Shanshan Zhang <ShanshanX.Zhang@xxxxxxxxx>
> Reviewed-by: Alexandru Gagniuc <mr.nuke.me@xxxxxxxxx>
> Reviewed-by: Andy Shevchenko <andy.shevchenko@xxxxxxxxx>
> ---
> Changnes:
> v2: revise description and code according to suggestion from Andy.
> v3: change code to simpler.
> v4: no change.
> v5: no change.
> v6: no change.
>
> drivers/pci/pci.h | 37 +++++--------------------------------
> 1 file changed, 5 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 455b32187abd..f2beeaeda321 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -359,39 +359,12 @@ struct pci_sriov {
> static inline bool pci_dev_set_io_state(struct pci_dev *dev,
> pci_channel_state_t new)
> {
> - bool changed = false;
> -
> device_lock_assert(&dev->dev);
> - switch (new) {
> - case pci_channel_io_perm_failure:
> - switch (dev->error_state) {
> - case pci_channel_io_frozen:
> - case pci_channel_io_normal:
> - case pci_channel_io_perm_failure:
> - changed = true;
> - break;
> - }
> - break;
> - case pci_channel_io_frozen:
> - switch (dev->error_state) {
> - case pci_channel_io_frozen:
> - case pci_channel_io_normal:
> - changed = true;
> - break;
> - }
> - break;
> - case pci_channel_io_normal:
> - switch (dev->error_state) {
> - case pci_channel_io_frozen:
> - case pci_channel_io_normal:
> - changed = true;
> - break;
> - }
> - break;
> - }
> - if (changed)
> - dev->error_state = new;
> - return changed;
> + if (dev->error_state == new)
> + return false;
> +
> + dev->error_state = new;
> + return true;
> }

IIUC this changes the behavior of the function, but it's difficult to
analyze because it does a lot of simplification at the same time.

Please consider the following, which is intended to simplify the
function while preserving the behavior (but please verify; it's been a
long time since I looked at this). Then maybe see how your patch
could be done on top of this?

Alternatively, come up with your own simplification patch + the
functionality change.


commit 983d9b1f8177 ("PCI/ERR: Simplify pci_dev_set_io_state()")
Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Date: Tue May 19 12:28:57 2020 -0500

PCI/ERR: Simplify pci_dev_set_io_state()

Truth table:

requested new state
current ------------------------------------------
state normal frozen perm_failure
------------ + ------------- ------------- ------------
normal | normal frozen perm_failure
frozen | normal frozen perm_failure
perm_failure | perm_failure* perm_failure* perm_failure

* "not changed", returns false

No functional change intended.

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6d3f75867106..81408552f7c9 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -358,39 +358,21 @@ struct pci_sriov {
static inline bool pci_dev_set_io_state(struct pci_dev *dev,
pci_channel_state_t new)
{
- bool changed = false;
-
device_lock_assert(&dev->dev);
- switch (new) {
- case pci_channel_io_perm_failure:
- switch (dev->error_state) {
- case pci_channel_io_frozen:
- case pci_channel_io_normal:
- case pci_channel_io_perm_failure:
- changed = true;
- break;
- }
- break;
- case pci_channel_io_frozen:
- switch (dev->error_state) {
- case pci_channel_io_frozen:
- case pci_channel_io_normal:
- changed = true;
- break;
- }
- break;
- case pci_channel_io_normal:
- switch (dev->error_state) {
- case pci_channel_io_frozen:
- case pci_channel_io_normal:
- changed = true;
- break;
- }
- break;
+
+ /* Can always put a device in perm_failure state */
+ if (new == pci_channel_io_perm_failure) {
+ dev->error_state == pci_channel_io_perm_failure;
+ return true;
}
- if (changed)
- dev->error_state = new;
- return changed;
+
+ /* If already in perm_failure, can't set to normal or frozen */
+ if (dev->error_state == pci_channel_io_perm_failure)
+ return false;
+
+ /* Can always change normal to frozen or vice versa */
+ dev->error_state = new;
+ return true;
}

static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)