Re: [PATCH v5 2/3] PCI: qcom: Restrict pci transactions after pci suspend

From: Krishna Chaitanya Chundru
Date: Mon Aug 29 2022 - 13:32:45 EST

Next message: Sean Christopherson: "Re: [RFC PATCH 0/2] KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT"
Previous message: William McVicker: "Re: [PATCH v5 20/20] PCI: dwc: Add Baikal-T1 PCIe controller support"
In reply to: Manivannan Sadhasivam: "Re: [PATCH v5 2/3] PCI: qcom: Restrict pci transactions after pci suspend"
Next in thread: Manivannan Sadhasivam: "Re: [PATCH v5 2/3] PCI: qcom: Restrict pci transactions after pci suspend"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 8/27/2022 10:56 PM, Manivannan Sadhasivam wrote:

On Fri, Aug 26, 2022 at 03:23:00PM -0500, Stephen Boyd wrote:

Quoting Krishna Chaitanya Chundru (2022-08-25 06:52:43)

On 8/24/2022 10:50 PM, Stephen Boyd wrote:

Quoting Krishna Chaitanya Chundru (2022-08-23 20:37:59)

On 8/9/2022 12:42 AM, Stephen Boyd wrote:

Quoting Krishna chaitanya chundru (2022-08-03 04:28:53)

If the endpoint device state is D0 and irq's are not freed, then
kernel try to mask interrupts in system suspend path by writing
in to the vector table (for MSIX interrupts) and config space (for MSI's).

These transactions are initiated in the pm suspend after pcie clocks got
disabled as part of platform driver pm suspend call. Due to it, these
transactions are resulting in un-clocked access and eventually to crashes.

Why are the platform driver pm suspend calls disabling clks that early?
Can they disable clks in noirq phase, or even later, so that we don't
have to check if the device is clocking in the irq poking functions?
It's best to keep irq operations fast, so that irq control is fast given
that these functions are called from irq flow handlers.

We are registering the pcie pm suspend ops as noirq ops only. And this
msix and config

access is coming at the later point of time that is reason we added that
check.

What is accessing msix and config? Can you dump_stack() after noirq ops
are called and figure out what is trying to access the bus when it is
powered down?

The msix and config space is being accessed to mask interrupts. The
access is coming at the end of the suspend

and near CPU disable. We tried to dump the stack there but the call
stack is not coming as it is near cpu disable.

That is odd that you can't get a stacktrace.

But we got dump at resume please have look at it

[   54.946268] Enabling non-boot CPUs ...
[   54.951182] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 5.15.41 #105
43491e4414b1db8a6f59d56b617b520d92a9498e
[   54.961122] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP
SKU2 platform (DT)
[   54.969088] Call trace:
[   54.971612] dump_backtrace+0x0/0x200
[   54.975399] show_stack+0x20/0x2c
[   54.978826] dump_stack_lvl+0x6c/0x90
[   54.982614] dump_stack+0x18/0x38
[   54.986043] dw_msi_unmask_irq+0x2c/0x58
[   54.990096] irq_enable+0x58/0x90
[   54.993522] __irq_startup+0x68/0x94
[   54.997216] irq_startup+0xf4/0x140
[   55.000820] irq_affinity_online_cpu+0xc8/0x154
[   55.005491] cpuhp_invoke_callback+0x19c/0x6e4
[   55.010077] cpuhp_thread_fun+0x11c/0x188
[   55.014216] smpboot_thread_fn+0x1ac/0x30c
[   55.018445] kthread+0x140/0x30c
[   55.021788] ret_from_fork+0x10/0x20
[   55.028243] CPU1 is up

So the same stack should be called at the suspend path while disabling CPU.

Sounds like you're getting hit by affinity changes while offlining CPUs
during suspend (see irq_migrate_all_off_this_cpu()). That will happen
after devices are suspended (all phases of suspend ops).

The affinity setting should not happen since DWC MSI controller doesn't support
setting IRQ affinity (hierarchial IRQ domain). In the migrate_one_irq()
function, there is a check for the existence of the irq_set_affinity()
callback, but the DWC MSI controller return -EINVAL in the callback. So this
is the reason the migration was still atempted?

A quick check would be to test this suspend/resume with GIC ITS for MSI since
it supports settings IRQ affinity and resides in a separate domain.
Chaitanya, can you try that?

Hi mani,

I tried with gic its there also I see same behavior.

The only which helps to comment out affinity in the following function.

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 21b3ac2a29d2..042afec1cf9d 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -487,8 +487,9 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node,

               if (affinity) {
                        if (affinity->is_managed) {
-                               flags = IRQD_AFFINITY_MANAGED |
-                                       IRQD_MANAGED_SHUTDOWN;
+//                             flags = IRQD_AFFINITY_MANAGED |
+//                                     IRQD_MANAGED_SHUTDOWN;
+                               flags = 0;//IRQD_AFFINITY_MANAGED |
                        }
                        mask = &affinity->mask;
                        node = cpu_to_node(cpumask_first(mask));

If there is any other way to remove these calls can you please help us
point that way.

I'm not sure. I believe genirq assumes the irqchips are always
accessible. There is some support to suspend irqchips. See how the
struct irq_chip::irq_suspend() function is called by syscore ops in the
generic irqchip 'irq_gc_syscore_ops' hooks. Maybe you could add a
syscore suspend/resume hook to disable/enable the clks and power to the
PCI controller. syscore ops run after secondary CPUs are hotplugged out
during suspend.

Or maybe setting the IRQCHIP_MASK_ON_SUSPEND flag can be used so that on
irq migration nothing writes the irq hardware because it is already
masked in the hardware earlier. I think the problem is that on resume
we'll restart the irq from the first CPU online event, when you don't
want to do that because it is too early.

I have another question though, which is do MSIs support wakeup? I don't
see how it works if the whole bus is effectively off during suspend. If
wakeup needs to be supported then I suspect the bus can't be powered
down during suspend.

Wake up should be handled by a dedicated side-band GPIO or in-band PME message.

But I still wonder how the link stays in L1/L1ss when the clocks are disabled
and PHY is powered down. Maybe the link or phy is powered by a separate power
domain like MX that keeps the link active?

We will come back to you on this.

Thanks,
Mani

Next message: Sean Christopherson: "Re: [RFC PATCH 0/2] KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT"
Previous message: William McVicker: "Re: [PATCH v5 20/20] PCI: dwc: Add Baikal-T1 PCIe controller support"
In reply to: Manivannan Sadhasivam: "Re: [PATCH v5 2/3] PCI: qcom: Restrict pci transactions after pci suspend"
Next in thread: Manivannan Sadhasivam: "Re: [PATCH v5 2/3] PCI: qcom: Restrict pci transactions after pci suspend"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]