Re: [BUG][5.18rc5] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10

From: Keith Busch
Date: Wed Feb 22 2023 - 10:37:42 EST


On Wed, Feb 22, 2023 at 06:59:59PM +0500, Mikhail Gavrilov wrote:
> On Thu, May 5, 2022 at 10:19 AM Keith Busch <kbusch@xxxxxxxxxx> wrote:
>
> > The troubleshooting steps for your observation is to:
> >
> > 1. Turn off APST (nvme_core.default_ps_max_latency_us=0)
> > 2. Turn off APSM (pcie_aspm=off)
> > 3. Turn off both
> >
> > Typically one of those resolves the issue.
>
> What to do if none of these steps helped? I attached log which proves
> that I am using both parameters nvme_core.default_ps_max_latency_us=0
> and pcie_aspm=off .

Those are just the most readily available things we can tune at
this level that has helped on *some* platform/device combinations.
Certainly not going to solve every problem.

You are showing that the driver can't read from the device's memory,
and there's nothing the driver can do about that. This is usually
some platform bios breakage well below the visibility of the nvme
driver.

Perhaps your platform's bridge windows are screwed up. One other
thing you can try is adding param "pci=nocrs" to have the kernel
ignore ACPI when setting these up.