Re: [PATCH 1/2] nvme-apple: Do not try to shut down the controller twice

From: Hector Martin
Date: Wed Jan 11 2023 - 00:22:33 EST


On 11/01/2023 13.54, Christoph Hellwig wrote:
> On Wed, Jan 11, 2023 at 01:36:13PM +0900, Hector Martin wrote:
>> The blamed commit stopped explicitly disabling the controller when we do
>> a controlled shutdown, but apple_nvme_reset_work was only checking for
>> the disable bit before deciding to issue another disable. Check for the
>> shutdown state too, to avoid breakage.
>>
>> This issue does not affect nvme-pci, since it only issues controller
>> shutdowns when the system is actually shutting down anyway.
>
> There's a few other places where nvme-pci does a shutdown like
> probe/reset failure and most notably and mostly notably various
> power management scenarios.
>
> What path is causing a problem here for nvme-apple? I fear we're
> missing some highler level check here and getting further out of
> sync.
>

OK, so the first question is who is responsible for resetting the
controller after a shutdown? The spec requires a reset in order to bring
it back up from that state. Indeed the PCIe code does an explicit
disable right now (though, judging by the comment, it probably wasn't
done with the intent of resetting after a shutdown, it just happens to
work for that too :))

Right now, apple_nvme_reset_work() tries to check for the condition of
an enabled controller (under the assumption that it's coming from a live
controller, not considering shutdown/sleep) and issue an
apple_nvme_disable(). However, this doesn't work when resuming because
at that point the firmware coprocessor is shut down, so the device isn't
usable (can't even get a disable command to complete properly). Perhaps
a better conditional here would be to check for
apple_rtkit_is_running(), since apple_nvme_disable() can't work otherwise.

But then, even if we get past that, once apple_nvme_reset_work actually
resets the firmware CPU and kicks things back online, the controller is
still logically in the shutdown state. So something has to disable it in
order for nvme_enable_ctrl() to work.

An alternate patch would be to change the gate for apple_nvme_disable()
in apple_nvme_reset_work() to check for apple_rtkit_is_running() on top
of the controller enable state, and then add a further direct call to
nvme_disable_ctrl(..., false) later in apple_nvme_reset_work, once the
firmware is back up, to ensure we can enable it after. Thoughts?

Alternatively, we could just revert to the prior behavior of always
issuing a disable after a shutdown. We need to disable at some point to
come back anyway, so it might as well be done early (before we shut down
firmware, so it works).

- Hector