Re: [RFC PATCH v2 1/2] PM: domains: Skip disabling unused domains if provider has sync_state

From: Bjorn Andersson
Date: Mon Feb 20 2023 - 12:12:28 EST


On Tue, Feb 07, 2023 at 03:45:35PM -0800, Doug Anderson wrote:
> Hi,
>
> On Mon, Feb 6, 2023 at 1:35 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >
> > On Mon, Feb 6, 2023 at 1:10 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, Feb 6, 2023 at 11:33 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Feb 6, 2023 at 9:48 AM Abel Vesa <abel.vesa@xxxxxxxxxx> wrote:
> > > > >
> > > > >
> > > > > CC'ed Saravana
> > > >
> > > > Thanks. Please do cc me for stuff like this from the start. I skimmed
> > > > the series and I think it's doing one of my TODO items. So, thanks for
> > > > the patch!
> > > >
> > > > I'll take a closer look within a few days -- trying to get through
> > > > some existing fw_devlink stuff.
> > > >
> > > > But long story short, it is the right thing to keep a supplier on
> > > > indefinitely if there's a consumer device (that's not disabled in DT)
> > > > that never gets probed. It's a pretty common scenario -- for example,
> > > > say a display backlight. The default case should be functional
> > > > correctness. And then we can add stuff that allows changing this
> > > > behavior with command line args or something else that can be done
> > > > from userspace.
> > > >
> > > > +1 to what Doug said elsewhere in this thread too. I'm trying to
> > > > consolidate the "when do we give up" decision at the driver core level
> > > > independent of what framework is being used.
> > >
> > > I'm not really sure I agree with the above, at least not without lots
> > > of discussion in the community. It really goes against what the kernel
> > > has been doing for years and years in the regulator and clock
> > > frameworks. Those frameworks both eventually give up and power down
> > > resources that no active drivers are using. Either changing the
> > > regulator/clock frameworks or saying that other frameworks should work
> > > in an opposite way seems like a recipe for confusion.
> > >
> > > Now, certainly I won't say that the way that the regulator and clock
> > > frameworks function is perfect nor will I say that they don't cause
> > > any problems. However, going the opposite way where resources are kept
> > > at full power indefinitely will _also_ cause problems.
> > >
> > > Specifically, let's look at the case you mentioned of a display
> > > backlight. I think you're saying that if there is no backlight driver
> > > enabled in the kernel that you'd expect the backlight to just be on at
> > > full brightness.
> >
> > No, I'm not saying that.
> >
> > > Would you expect this even if the firmware didn't
> > > leave the backlight on?
> >
> > sync_state() never turns on anything that wasn't already on at boot.
> > So in your example, if the firmware didn't turn on the backlight, then
> > it'll remain off.
>
> As per offline discussion, part of the problems are that today this
> _isn't_ true for a few Qualcomm things (like interconnect). The
> interconnect frameway specifically maxes things out for early boot.
>

The problem being solved here is that the bootloader leaves some vote at
1GB/s, as needed by hardware related to driver B.

Driver A is loaded first and votes for 1kb/s; what should the kernel do
now, without knowledge of the needs from the hardware associated with B,
or the ability to read back the bootloader's votes.

This was the behavior of the initial implementation, and the practical
implications was seen as the UART would typically come along really
early, cast a low vote on the various buses and it would take forever to
get to the probing of the drivers that actually gave us reasonable
votes.


Also consider the case where driver A probes, votes for bandwidth, does
it's initialization and then votes for 0. Without making assumptions
about the needs of B (or a potential B even), we'd turn off critical
resources - possible preventing us from ever attempting to probe B.



As such, the only safe solution is to assume that there might be a later
loaded/probed client that has a large vote and preemptively vote for
some higher bandwidth until then.

>
> > > In any case, why do you say it's more correct?
> >
> > Because if you turn off the display, the device is unusable. In other
> > circumstances, it can crash a device because the firmware powered it
> > on left it in a "good enough" state, but we'd go turn it off and crash
> > the system.
> >
> > > I suppose you'd say that the screen is at least usable like this.
> > > ...except that you've broken a different feature: suspend/resume.
> >
> > If the display is off and the laptop is unusable, then we have bigger
> > problems than suspend/resume?
>
> I suspect that here we'll have to agree to disagree. IMO it's a
> non-goal to expect hardware to work for which there is no driver. So
> making the backlight work without a backlight driver isn't really
> something we should strive for.
>

Without trying to make you agree ;)

How can you differentiate between "the driver wasn't built" and "the
driver isn't yet available"?

Consider the case where I boot my laptop, I have some set of builtin
drivers, some set of drivers in the ramdisk and some set of drivers in
the root filesystem.

In the event that something goes wrong mounting the rootfs, I will now
be in the ramdisk console. Given the current timer-based disabling of
regulators, I have ~25 seconds to solve my problem before the backlight
goes blank.


Obviously this isn't a typical scenario in a consumer device, but it
seems conceivable that your ramdisk would run fsck for some amount of
time before mounting the rootfs and picking up the last tier of drivers.

Regards,
Bjorn