Re: [PATCH] cpufreq: suspend/resume governors with PM notifiers

From: Rafael J. Wysocki
Date: Sun Nov 17 2013 - 09:57:36 EST


On Sunday, November 17, 2013 01:52:15 PM viresh kumar wrote:
> On Sunday 17 November 2013 06:38 AM, Rafael J. Wysocki wrote:
> > On Saturday, November 16, 2013 08:47:24 PM Viresh Kumar wrote:
> >> Well that is pretty much doable.
> >
> > Not necessarily on all CPU models.
>
> Okay.. Just for my understanding, why?

If the graphics and processor are in one chip, the CPU may ignore your
perf bump up request for power balancing reasons.

> >> So PM_POST_HIBERNATION is called just before shutting off the system? And
> >> PM_POST_RESTORE is called after system is resumed from saved image?
> >
> > PM_POST_HIBERNATION is only called if there's an error during hibernation.
> > PM_POST_RESTORE is called as you said.
>
> Ahh I see. Thanks.
>
> > Also you have to remember that the _PREPARE PM notifiers are called before the
> > freezing of tasks when user space is still running, so disabling governors at
> > that point may lead to some weird behavior.
>
> Actually good point. I haven't thought about it earlier.
>
> And when I see what bad can happen, I couldn't find much. The worst is that we
> wouldn't go to a frequency requested by userspace daemon. But we wouldn't send
> an error then. But I feel we can let that happen. Not servicing a request after
> we have started system suspend doesn't look that odd..
>
> Sysfs infrastructure is still preserved and so all that information would still
> be available.
>
> Do you see anything extra that might stop working?

Well, the code would be racy with the patch as is. User space might manipulate
the sysfs knobs in parallel with your PM notifiers, for example, and I'm not
entirely sure what can happen then. And the lock in there is pointless,
because it doesn't prevent any races from happening.

> >>> Actually, we use CPU offline/online during system suspend/resume to avoid
> >>> having to do stuff like this from PM notifiers.
> >>
> >> I didn't get the logic behind this one..
> >
> > If we have to do special stuff from PM notifiers for CPU "suspend", we will be
> > better off by doing something entirely special instead of CPU offline down the
> > road. Which we may end up doing given the problems with frozen/not frozen in
> > the cpufreq core.
>
> A unrelated question here. Why are we offlining CPUs after suspending all the
> devices? Because the problem Nishanth mentioned was that he required few
> devices, i2c, to be available when CPUs are getting down. And there might be
> similar requirements at other places too. Was there any specific bottleneck due
> to which it is implemented this way?

No, this is because the ACPI spec mandates powering down devices before CPUs
during system suspend. The way it is done today, however, I think we don't
need to keep that ordering so strictly any more. We definitely don't need to
do that on non-ACPI systems.

So while I hate the PM notifiers idea (sorry, but that's how it goes), I think
it would be OK to suspend *some* devices after disabling CPUs (not all of them,
of course).

And as I said, I think it would be OK to introduce suspend/resume callbacks for
CPU devices and use those callbacks to work around the ordering issues, when
necessary. The main point is that the changes made for this purpose should
only affect systems where they are necessary and not everyone. I don't want
to change the way things work today in general in cpufreq too much unless they
are plain bugs that affect everyone.

> > We may introduce suspend_noirq and resume_noirq for cpu_subsys, for example,
> > and handle things from there. Or something similar. But slapping PM notifiers
> > on top of the existing code just because it appears to be easy (and making that
> > code even more overdesigned than it already is this way) doesn't seem quite
> > right.
> >
> > Now, the Tianyu's patch extends the Srivatsa's approach to governors, which
> > actually should have been done from the outset, so it is within the scope of
> > what we have already. It may not solve all of the problems, but it still makes
> > some progress and has a little chance to introduce *new* problems at the same
> > time.
>
> I understand your point here. But this is what I feel:
>
> - I don't have any special affection for using PM notifiers :) .. Its just that
> I need some way for cpufreq core to know that Suspend has started. Maybe after
> freezing of tasks and before removal of devices.
>
> - I thought of adding something like a suspend-prepare for syscore_ops (You are
> owner of all these frameworks and so our life is easy as we can discuss stuff
> with you directly :)).. But then thought maybe we can use PM notifiers.. But it
> looks that we better do that now ?
>
> - I have concerns with Tianyu's patch as policies should be better taken care of
> in cpufreq core instead of passing them over to governors.

Well, this is all too tangled anyway, but quite frankly I'm not sure if it is
worth untangling at this point. We're deprecating cpufreq anyway.

> And with the alternative solution I had, code is getting more and more dirty.
> And so I thought of doing something else.
>
> - Not all platforms have problem with changing frequency during suspend/resume
> and so we may not require disabling of governors for all of them. Probably can
> add another field based on which we may/may-not disable governors from PM or
> syscore notifiers.

What exactly is wrong with adding suspend/resume callbacks to cpu_subsys?

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/