Re: [PATCH] PCI: Exit restore process when device is still powerdown

From: Bjorn Helgaas
Date: Thu Jan 12 2023 - 17:21:59 EST


On Thu, Dec 22, 2022 at 12:41:04PM +0000, Jiantao Zhang wrote:
> We get this stack when the rp doesn't power up in resume noirq:

s/rp/Root Port/

"resume noirq" seems to refer to a function, so please mention the
exact function name.

> dump_backtrace.cfi_jt+0x0/0x4
> dump_stack_lvl+0xb4/0x10c
> show_regs_before_dump_stack+0x1c/0x30
> arm64_serror_panic+0x110/0x1a8
> do_serror+0x16c/0x1cc
> el1_error+0x8c/0x10c
> do_raw_spin_unlock+0x74/0xdc
> pci_bus_read_config_word+0xdc/0x1dc
> pci_restore_msi_state+0x2f4/0x36c
> pci_restore_state+0x13f0/0x1444
> pci_pm_resume_noirq+0x158/0x318
> dpm_run_callback+0x178/0x5e8
> device_resume_noirq+0x250/0x264
> async_resume_noirq+0x20/0xf8
> async_run_entry_fn+0xfc/0x364
> process_one_work+0x37c/0x7f4
> worker_thread+0x3e8/0x754
> kthread+0x168/0x204
> ret_from_fork+0x10/0x18
> The ep device uses msix, the restore process will write bar space
> in __pci_msix_desc_mask_irq, which will result in accessing the
> powerdown area when the rp doesn't power on.

s/ep/endpoint/
s/msix/MSI-X/ to match spec usage
s/bar/BAR/
Add "()" after function names, e.g., __pci_msix_desc_mask_irq()
s/rp/Root Port/

> It makes sense we should do nothing when the device is still powerdown.
>
> Signed-off-by: Jianrong Zhang <zhangjianrong5@xxxxxxxxxx>
> Signed-off-by: Jiantao Zhang <water.zhangjiantao@xxxxxxxxxx>
> ---
> drivers/pci/pci.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index fba95486caaf..279f6e8c5a00 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1764,7 +1764,7 @@ static void pci_restore_rebar_state(struct pci_dev *pdev)
> */
> void pci_restore_state(struct pci_dev *dev)
> {
> - if (!dev->state_saved)
> + if (!dev->state_saved || dev->current_state == PCI_UNKNOWN)
> return;

This doesn't seem right to me because it seems like we're covering up
a problem elsewhere.

If we need access to the endpoint to restore state, shouldn't we
ensure that the endpoint is powered up before we try to access it?

We depend on the state being restored, so if we skip the restore here,
where *will* it happen?

Bjorn