Re: [Bug] nvme blocks PC10 since v5.15 - bisected

From: Keith Busch
Date: Fri Jan 21 2022 - 16:09:11 EST


On Fri, Jan 21, 2022 at 08:00:49PM +0100, Rafael J. Wysocki wrote:
> Hi Keith,
>
> It is reported that the following commit
>
> commit e5ad96f388b765fe6b52f64f37e910c0ba4f3de7
> Author: Keith Busch <kbusch@xxxxxxxxxx>
> Date: Tue Jul 27 09:40:44 2021 -0700
>
> nvme-pci: disable hmb on idle suspend
>
> An idle suspend may or may not disable host memory access from devices
> placed in low power mode. Either way, it should always be safe to
> disable the host memory buffer prior to entering the low power mode, and
> this should also always be faster than a full device shutdown.
>
> Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx>
> Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
>
> is the source of a serious power regression occurring since 5.15
> (please see https://bugzilla.kernel.org/show_bug.cgi?id=215467).
>
> After this commit, the SoC on the affected system cannot enter
> C-states deeper than PC2 while suspended to idle which basically
> defeats the purpose of suspending.
>
> What may be happening is that nvme_disable_prepare_reset() that is not
> called any more in the ndev->nr_host_mem_descs case somehow causes the
> LTR of the device to change to "no requirement" which allows deeper
> C-states to be entered.
>
> Can you have a look at this, please?

I thought platforms that wanted full device shutdown behaviour would
always set acpi_storage_d3. Is that not happening here?