Re: [PATCH v2 10/15] soc: qcom: ipa: use new module_firmware_crashed()

From: Alex Elder
Date: Fri May 22 2020 - 16:52:49 EST


On 5/22/20 12:28 AM, Luis Chamberlain wrote:
On Tue, May 19, 2020 at 05:34:13PM -0500, Alex Elder wrote:
On 5/15/20 4:28 PM, Luis Chamberlain wrote:
This makes use of the new module_firmware_crashed() to help
annotate when firmware for device drivers crash. When firmware
crashes devices can sometimes become unresponsive, and recovery
sometimes requires a driver unload / reload and in the worst cases
a reboot.

Using a taint flag allows us to annotate when this happens clearly.

I don't fully understand what this is meant to do, so I can't
fully assess whether it's the right thing to do.

It is meant to taint the kernel to ensure it is clear that something
critically bad has happened with the device firmware, it crashed, and
recovery may or may not happen, we are not 100% certain.

But in this particular place in the IPA code, the *modem* has
crashed. And the IPA driver is not responsible for modem
firmware, remoteproc is.

Oi vei. So the device it depends on has crashed.

Yes, more or less. It could be considered a little more
nuanced than even that, but I won't get into it here.

The IPA driver *can* be responsible for loading some other
firmware, but even in that case, it only happens on initial
boot, and it's basically assumed to never crash.

OK is this an issue which we can recover from? If for the slightest bit
this can affect users it is something we should inform them over.

If the IPA driver successfully loads this firmware, it should
be assumed to never crash. So in that respect, there is no
recovery required.

Again, the modem (with which the IPA hardware and driver
interacts) can crash, or it can be shut down intentionally.
And in either case it can recover, automatically or manually.
But all of that (and the modem's firmware loading) is the
responsibility of the remoteproc subsystem, not IPA.

This patch set is missing uevents for these issues, but I just added
support for this.

So regardless of whether this module_firmware_crashed() call is
appropriate in some places, I believe it should not be used here.

OK thanks. Can the user be affected by this crash? If so how? Can
we recover ? Is that always guaranteed?

We can't guarantee anything about recovering from a crash of
an independent entity. But by design, recovery from a modem
crash is possible and is supposed to leave Linux in a
consistent state. A modem crash is likely to be observable
to the user.

I'll repeat that I don't believe the new call you inserted
in the IPA driver belongs there.

-Alex


Luis