[Bug] nvme blocks PC10 since v5.15 - bisected

From: Rafael J. Wysocki
Date: Fri Jan 21 2022 - 14:01:03 EST


Hi Keith,

It is reported that the following commit

commit e5ad96f388b765fe6b52f64f37e910c0ba4f3de7
Author: Keith Busch <kbusch@xxxxxxxxxx>
Date: Tue Jul 27 09:40:44 2021 -0700

nvme-pci: disable hmb on idle suspend

An idle suspend may or may not disable host memory access from devices
placed in low power mode. Either way, it should always be safe to
disable the host memory buffer prior to entering the low power mode, and
this should also always be faster than a full device shutdown.

Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx>
Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
Signed-off-by: Christoph Hellwig <hch@xxxxxx>

is the source of a serious power regression occurring since 5.15
(please see https://bugzilla.kernel.org/show_bug.cgi?id=215467).

After this commit, the SoC on the affected system cannot enter
C-states deeper than PC2 while suspended to idle which basically
defeats the purpose of suspending.

What may be happening is that nvme_disable_prepare_reset() that is not
called any more in the ndev->nr_host_mem_descs case somehow causes the
LTR of the device to change to "no requirement" which allows deeper
C-states to be entered.

Can you have a look at this, please?

Cheers,
Rafael