Re: [PATCH v3 1/2] clk: Add generic sync_state callback for disabling unused clocks

From: Stephen Boyd
Date: Tue Feb 21 2023 - 14:24:45 EST


Quoting Bjorn Andersson (2023-02-20 08:21:37)
> On Fri, Feb 17, 2023 at 09:38:22PM -0800, Stephen Boyd wrote:
> > Quoting Abel Vesa (2022-12-27 12:45:27)
> > > There are unused clocks that need to remain untouched by clk_disable_unused,
> > > and most likely could be disabled later on sync_state. So provide a generic
> > > sync_state callback for the clock providers that register such clocks.
> > > Then, use the same mechanism as clk_disable_unused from that generic
> > > callback, but pass the device to make sure only the clocks belonging to
> > > the current clock provider get disabled, if unused. Also, during the
> > > default clk_disable_unused, if the driver that registered the clock has
> > > the generic clk_sync_state_disable_unused callback set for sync_state,
> > > skip disabling its clocks.
> >
> > How does that avoid disabling clks randomly in the clk tree? I'm
> > concerned about disabling an unused clk in the middle of the tree
> > because it doesn't have a driver using sync state, while the clk is the
> > parent of an unused clk that is backed by sync state.
> >
> > clk A --> clk B
> >
> > clk A: No sync state
> > clk B: sync state
> >
> > clk B is left on by the bootloader. __clk_disable_unused(NULL) is called
> > from late init. Imagine clk A is the root of the tree.
> >
> > clk_disable_unused_subtree(clk_core A)
> > clk_disable_unused_subtree(clk_core B)
> > if (from_sync_state && core->dev != dev)
> > return;
> > ...
> > clk core A->ops->disable()
> >
> > clk core B is off now?
> >
>
> I will have to give this some more thought. But this is exactly what we
> have today; consider A being any builtin clock driver and B being any
> clock driver built as modules, with relationship to A.
>
> clk_disable_unused() will take down A without waiting for B, possibly
> locking up parts of the clock hardware of B; or turning off the clocks
> to IP blocks that rely on those clocks (e.g. display).

Oh, thanks for clarifying! Yes, the disabling of unused clks with
respect to modules is broken in the same way. This makes that brokenness
equally apply to builtin drivers, making the situation worse, not better.

>
> So my thought on this is that I don't think this patch negatively alter
> the situation. But as it isn't recursive, this remains a problem that
> needs to be fixed.
>
> > Also sync_state seems broken right now. I saw mka mentioned that if you
> > have a device node enabled in your DT but never enable a driver for it
> > in the kernel we'll never get sync_state called. This is another
> > problem, but it concerns me that sync_state would make the unused clk
> > disabling happen at some random time or not at all.
> >
>
> I don't think that sync_state is "broken".
>
> There is no way to distinguish a driver not being built in, or a driver
> being built as module but not yet loaded. The approach taken by
> sync_state currently is optimistically speculative.
>
> One alternative to this is found in the regulator framework, where we
> have a 30 second timer triggering the late disable. The result of this
> is that every time I end up in the ramdisk console because "root file
> system can't be mounted", I have 25 second to figure out what the
> problem is before the backlight goes out...
>
> As such I do prefer the optimistic approach...
>
> > Can the problem be approached more directly? If this is about fixing
> > continuous splash screen, then I wonder why we can't list out the clks
> > that we know are enabled by the bootloader in some new DT binding, e.g.:
> >
> > clock-controller {
> > #clock-cells = <1>;
> > boot-handoff-clocks = <&consumer_device "clock cells for this clk provider">;
> > };
> >
>
> I was under the impression that we have ruled out this approach.

Where did we rule it out? I suppose we already have this information if
we search the DT for 'clocks' properties and read the clk hardware at
boot to see if it is enabled. Putting these two things together gets us
all the enabled clks at boot that are consumed by device nodes. It
doesn't tell use about anything not consumed in DT, or handle overlays,
but if we have that problem we can probably figure something out later.

>
> Presumably this list should not be a manually maintained list of display
> clocks, and that means the bootloader would need to go in and build
> this list of all enabled clocks. I don't think this is practical.

Why does the bootloader need to do that? The devicetree author can list
out the clks that they want to keep on for the display driver until the
display driver can acquire them.

>
> > Then mark those as "critical/don't turn off" all the way up the clk tree
> > when the clk driver probes by essentially incrementing the
> > prepare/enable count but not actually touching the hardware, and when
> > the clks are acquired by clk_get() for that device that's using them
> > from boot we make the first clk_prepare_enable() do nothing and not
> > increment the count at all. We can probably stick some flag into the
> > 'struct clk' for this when we create the handle in clk_get() so that the
> > prepare and enable functions can special case and skip over.
> >
>
> The benefit of sync_state is that it kicks when the client drivers has
> probed. As such, you can have e.g. the display driver clk_get(), then
> probe defer on some other resource, and the clock state can remain
> untouched.

Ok. I think this spitball design would do that still. It's not like we
would go and disable the clks that are handed to the display driver even
if it probe defers. The clk would be marked as enabled until the display
driver enables the clk, and then it wouldn't be disabled during late
init (or later) because the clk would be enabled either by the core or
by the display driver. The point where we transfer ownership of the
enable state is when the consumer calls clk_enable().

>
> > The sync_state hook operates on a driver level, which is too large when
> > you consider that a single clk driver may register hundreds of clks that
> > are not related. We want to target a solution at the clk level so that
> > any damage from keeping on all the clks provided by the controller is
> > limited to just the drivers that aren't probed and ready to handle their
> > clks. If sync_state could be called whenever a clk consumer consumes a
> > clk it may work? Technically we already have that by the clk_hw_provider
> > function but there isn't enough information being passed there, like the
> > getting device.
> >
>
> The current solution already operates on all clocks of all drivers, that
> happens to be probed at late_initcall(). This patch removes the
> subordinate clause from this, allowing clock drivers and their clients
> to be built as modules.
>
> So while it still operates on all clocks of a driver, it moves that
> point to a later stage, where that is more reasonable to do.
>

When we have clk drivers that provide clks to many different device
drivers, they all have to probe for any unused clks to be disabled.

>
>
> It would probably (haven't considered all aspects) if sync_state could
> prune the tree gradually, disabling the branches that are fully probed.
>
> But it wouldn't affect Matthias problem; e.g. if you forget to build the
> venus driver, sync_state won't happen for that branch of the tree.
> (Something that is arguably better than leaving all the clocks for that
> driver enabled)
>

I didn't follow this part.