Re: [PATCH v4] bus: mhi: host: don't free bhie tables during suspend/hibernation

From: Jeff Hugo
Date: Fri May 16 2025 - 10:51:36 EST

Next message: Manivannan Sadhasivam: "Re: [PATCH v8 2/9] PCI: stm32: Add PCIe host support for STM32MP25"
Previous message: mlevitsk: "Re: [PATCH v4 3/4] x86: nVMX: check vmcs12->guest_ia32_debugctl value given by L2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 5/14/2025 1:17 AM, Muhammad Usama Anjum wrote:

On 5/13/25 8:16 PM, Jeff Hugo wrote:

On 5/13/2025 12:44 AM, Muhammad Usama Anjum wrote:

On 5/12/25 11:46 PM, Jeff Hugo wrote:

On 5/6/2025 8:49 AM, Muhammad Usama Anjum wrote:

Fix dma_direct_alloc() failure at resume time during bhie_table
allocation because of memory pressure. There is a report where at
resume time, the memory from the dma doesn't get allocated and MHI
fails to re-initialize.

To fix it, don't free the memory at power down during suspend /
hibernation. Instead, use the same allocated memory again after every
resume / hibernation. This patch has been tested with resume and
hibernation both.

The rddm is of constant size for a given hardware. While the fbc_image
size depends on the firmware. If the firmware changes, we'll free and
allocate new memory for it.

Why is it valid to load new firmware as a result of suspend? I don't
users would expect that.

I'm not sure its valid or not. Like other users, I also don't expect
that firmware would get changed. It doesn't seem to be tested and hence
supported case.

But other drivers have code which have implementation like this. I'd
mentioned previously that this patch was motivated from the ath12k [1]
and ath11k [2] patches. They don't free the memory and reuse the same
memory if new size is same.

It feels like this justification needs to be detailed in the commit
text. I suspect at some point we'll have another MHI device where the FW
will need to be cached.

Okay. I'll add this information to the commit message. Currently I've
not seen firmware caching being used other than graphics driver.

diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
index efa3b6dddf4d2..bc8459798bbee 100644
--- a/drivers/bus/mhi/host/boot.c
+++ b/drivers/bus/mhi/host/boot.c
@@ -584,10 +584,17 @@ void mhi_fw_load_handler(struct mhi_controller
*mhi_cntrl)
        * device transitioning into MHI READY state
        */
       if (fw_load_type == MHI_FW_LOAD_FBC) {

Why is this FBC specific?

It seems we allocate fbc_image only when firmware load type is
FW_LOAD_FBC. I'm just optimizing the buffer allocation here.

We alloc bhie tables in non FBC usecases. Is this somehow an FBC
specific issue? Perhaps you could clarify the limits of this solution in
the commit text?

Okay. I'll add information that we are optimizing the bhie allocations.
It has nothing to do with firmware type. I've found only 2 bhie
allocations; fbc_image and rddm_image. So we are optimizing those.

There is a 3rd allocation, and it has everything to do with firmware type. Did you miss mhi_load_image_bhie()? I'm not asking you to touch mhi_load_image_bhie(), but to recognize that what you are doing is specific to some BHIe devices, not all.

-        ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image,
fw_sz);
-        if (ret) {
-            release_firmware(firmware);
-            goto error_fw_load;
+        if (mhi_cntrl->fbc_image && fw_sz != mhi_cntrl->prev_fw_sz) {
+            mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image);
+            mhi_cntrl->fbc_image = NULL;
+        }
+        if (!mhi_cntrl->fbc_image) {
+            ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl-

fbc_image, fw_sz);

+            if (ret) {
+                release_firmware(firmware);
+                goto error_fw_load;
+            }
+            mhi_cntrl->prev_fw_sz = fw_sz;
           }
             /* Load the firmware into BHIE vec table */
diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c
index e6c3ff62bab1d..107d71b4cc51a 100644
--- a/drivers/bus/mhi/host/pm.c
+++ b/drivers/bus/mhi/host/pm.c
@@ -1259,10 +1259,19 @@ void mhi_power_down(struct mhi_controller
*mhi_cntrl, bool graceful)
   }
   EXPORT_SYMBOL_GPL(mhi_power_down);
   +static void __mhi_power_down_unprepare_keep_dev(struct
mhi_controller *mhi_cntrl)
+{
+    mhi_cntrl->bhi = NULL;
+    mhi_cntrl->bhie = NULL;

Why?

This function is shorter version of mhi_unprepare_after_power_down(). As
we need different code path in case of suspend/hibernation case, I was
adding a new API which Mani asked me remove and consolidate into
mhi_power_down_keep_dev() instead. So this static function has been
added. [3]

I don't understand the need to zero these out. Also, if you are copying
part of the functionality of mhi_unprepare_after_power_down(), shouldn't
that functionality be moved into your new API to eliminate duplication?

This how the cleanup works mhi_unprepare_after_power_down(). Yeah, it
makes sense to use this function in mhi_unprepare_after_power_down().

Sending next version soon.

Next message: Manivannan Sadhasivam: "Re: [PATCH v8 2/9] PCI: stm32: Add PCIe host support for STM32MP25"
Previous message: mlevitsk: "Re: [PATCH v4 3/4] x86: nVMX: check vmcs12->guest_ia32_debugctl value given by L2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]