Re: [PATCH] interconnect: Skip call into provider if initial bw is zero

From: Mike Tipton
Date: Mon Jan 23 2023 - 15:58:11 EST


On 1/19/2023 3:56 PM, Bryan O'Donoghue wrote:
On 19/01/2023 22:18, Vivek Aknurwar wrote:
Hi Bryan,
Thanks for taking time to review the patch.

On 1/13/2023 5:40 PM, Bryan O'Donoghue wrote:
On 14/01/2023 01:24, Bryan O'Donoghue wrote:
On 13/01/2023 22:07, Vivek Aknurwar wrote:
Currently framework sets bw even when init bw requirements are zero during
provider registration, thus resulting bulk of set bw to hw.
Avoid this behaviour by skipping provider set bw calls if init bw is zero.

Signed-off-by: Vivek Aknurwar <quic_viveka@xxxxxxxxxxx>
---
  drivers/interconnect/core.c | 17 ++++++++++-------
  1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 25debde..43ed595 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -977,14 +977,17 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
      node->avg_bw = node->init_avg;
      node->peak_bw = node->init_peak;
-    if (provider->pre_aggregate)
-        provider->pre_aggregate(node);
-
-    if (provider->aggregate)
-        provider->aggregate(node, 0, node->init_avg, node->init_peak,
-                    &node->avg_bw, &node->peak_bw);
+    if (node->avg_bw || node->peak_bw) {
+        if (provider->pre_aggregate)
+            provider->pre_aggregate(node);
+
+        if (provider->aggregate)
+            provider->aggregate(node, 0, node->init_avg, node->init_peak,
+                        &node->avg_bw, &node->peak_bw);
+        if (provider->set)
+            provider->set(node, node);
+    }
-    provider->set(node, node);
      node->avg_bw = 0;
      node->peak_bw = 0;

I have the same comment/question for this patch that I had for the qcom arch specific version of it. This patch seems to be doing at a higher level what the patch below was doing at a lower level.

https://lore.kernel.org/lkml/1039a507-c4cd-e92f-dc29-1e2169ce5078@xxxxxxxxxx/T/#m0c90588d0d1e2ab88c39be8f5f3a8f0b61396349

what happens to earlier silicon - qcom silicon which previously made explicit zero requests ?

This patch is to optimize and avoid all those bw 0 requests on each node addition during probe (which results in rpmh remote calls) for upcoming targets.

So why not change it just for rpmh ?

You are changing it for rpm here, as well as for Samsung and NXP interconnects.


This isn't actually changing it for all providers. Only for those that define the get_bw() callback. Right now that's only qcom/msm8974 and imx/imx. If get_bw() isn't defined, then icc_node_add() defaults to INT_MAX. So, the logical behavior in that case is unchanged. Which means this isn't even changing the behavior for rpmh yet, either.

We're also working on changes to align our downstream, qcom-specific, rpmh-specific sync-state approach with the common upstream approach. Part of which includes adding a get_bw() callback for rpmh that only returns non-zero BW for nodes already enabled from bootloaders or are otherwise marked as critical for HLOS operation (i.e. keepalive). Currently, the upstream rpmh driver doesn't define get_bw(), which means the framework votes INT_MAX for everything even if most of the nodes aren't needed yet.

Currently, with the upstream rpmh-based drivers this is just a performance/power optimization issue. It doesn't cause any functional failures. However, downstream we have additional nodes that use separate BCM voters than just the "apps" voter. These secondary voters aren't accessible when the providers probe, since they require additional regulator dependencies to be met first. We rely on the client voting for the required regulators before voting to interconnect for these nodes. So, we need to prevent the framework from calling our set() callbacks when adding these secondary nodes, otherwise it'll cause bus errors and crash the kernel. It's not always safe to assume that every node is immediately capable of being voted for when it's added.

We currently work around this by "stubbing" our pre_aggregate, aggregate, and set() callbacks when adding the nodes and only set them to the real callbacks after we've finished adding everything. But that stops being a valid workaround when we move to the upstream sync-state approach, since we're relying on the set() callback from icc_node_add() for placing the initial proxy votes for "keepalive" and other nodes already enable from boot.

I'm sure the secondary voters will make their way upstream some day, but not clear when yet. There are no upstream drivers in a state ready to use them yet anyway. But the other changes we're working on to add get_bw() to icc-rpmh providers to reduce the number of unnecessary calls during probe could go in sooner as an optimization.

It's not easy to implement this purely on the provider side, since we can't just always ignore zero votes. We need to honor zero votes that are made post-init so that things actually turn off. Thus, any logic that short-circuits the zero requests would need to be done only for the very first request. Each node would have to track if it's been called once already. And we'd have to spread that logic across pre_aggregate, aggregate, and set. There's isn't just one simple place to implement this on the provider side. This is much more easily handled on the framework side.


Taking rpm as an example, for certain generations of silicon we make an explicit zero call.

https://git.codelinaro.org/clo/la/kernel/msm-3.18/-/blob/LA.BR.1.2.9-00810-8x09.0/drivers/platform/msm/msm_bus/msm_bus_bimc.c#L1367

Here's the original RPM commit that sets a zero

https://git.codelinaro.org/clo/la/kernel/msm-3.18/-/commit/d91d108656a7a44a6dfcfb318a25d39c5418e54b
>>>>
https://lore.kernel.org/lkml/1039a507-c4cd-e92f-dc29-1e2169ce5078@xxxxxxxxxx/T/#m589e8280de470e038249bb362634221771d845dd

https://lkml.org/lkml/2023/1/3/1232

Isn't it a better idea to let lower layer drivers differentiate what they do ?

AFAIU lower layer driver can/should not differentiate between normal flow calls vs made as a result from probe/initialization of driver. Hence even bw 0 request is honored as like client in general wish to vote 0 as in an normal use case.

But surely if I vote zero, then I mean to vote zero ?

Do we know that for every architecture and for every different supported that ignoring a zero vote is the right thing to do ?

I don't think we do know that.


Relying on the existing behavior of icc_node_add() calling set() when the node's BW is already zero should be generally unnecessary. If the node is already physically disabled in HW, then disabling again should be a don't-care. And if the node is already physically enabled in HW, then get_bw() should logically return something non-zero for it. get_bw() is supposed to return the *current* BW. It's not always possible to know exactly what the BW is, so often the distinction may just be between zero and INT_MAX. But ultimately it would ideally return the actual current BW vote, such that the initial votes placed by icc_node_add() match the preexisting votes from boot and don't unnecessarily enable or dramatically increase BW of many nodes irrelevant for early kernel boot.

If the provider simply has no idea, then it can choose not to define the get_bw() callback and the framework will assume INT_MAX for everything. But if the provider wants to optimize the initial BW voting, it can define the get_bw() callback to inform the framework which nodes are already enabled and require proxy voting.

And relying on icc_node_add() calling set() for zero BW should also be unnecessary for cleaning up nodes enabled from boot that are no longer necessary. Because in either case if get_bw() returns non-zero or get_bw() isn't defined at all, then the framework has non-zero initial BW for them. And if no consumers explicitly vote for them, then they'll be disabled in icc_sync_state(). Sync-state is the proper place to disable resources no longer needed from boot.

https://lore.kernel.org/linux-arm-msm/20230116132152.405535-1-konrad.dybcio@xxxxxxxxxx/

I think for older rpm this is a departure from long existing logic.

Maybe its entirely benign but, IMO you should be proposing this change at the rpmh level only, not at the top level across multiple different interconnect arches.

---
bod