Re: [PATCH v2] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

From: Pierre Gondois
Date: Thu Sep 07 2023 - 14:22:05 EST


Hello Shrikanth,

On 9/6/23 13:35, Shrikanth Hegde wrote:


On 9/5/23 7:33 PM, Pierre Gondois wrote:
Hello Shrikanth,
I tried the patch (on a platform using the cppc_cpufreq driver). The
platform
normally has EAS enabled, but the patch removed the sched_energy_aware
sysctl.
It seemed the following happened (in the below order):

1. sched_energy_aware_sysctl_init()
Doesn't set sysctl_sched_energy_aware as cpufreq_freq_invariance isn't set
and arch_scale_freq_invariant() returns false

2. cpufreq_register_driver()
Sets cpufreq_freq_invariance during cpufreq initialization
sched_energy_set()

3. sched_energy_set()
Is called with has_eas=0 since build_perf_domains() doesn't see the
platform
as EAS compatible. Indeed sysctl_sched_energy_aware=0.
So with sysctl_sched_energy_aware=0 and has_eas=0, sched_energy_aware
sysctl
is not enabled even though EAS should be possible.


On 9/1/23 08:52, Shrikanth Hegde wrote:
Currently sysctl_sched_energy_aware doesn't alter the said behaviour on
some of the architectures. IIUC its meant to either force rebuild the
perf domains or cleanup the perf domains by echoing 1 or 0 respectively.

There is a definition of the sysctl at:
Documentation/admin-guide/sysctl/kernel.rst::sched_energy_aware
[...]


+static unsigned int sysctl_sched_energy_aware;
+static struct ctl_table_header *sysctl_eas_header;

The variables around the presence/absence of EAS are:
- sched_energy_present:
EAS is up and running

- sysctl_sched_energy_aware:
The user wants to use EAS (or not). Doesn't mean EAS can run on the
platform.

- sched_energy_set/partition_sched_domains_locked's "has_eas":
Local variable. Represent whether EAS can run on the platform.

IMO it would be simpler to (un)register sched_energy_aware sysctl
in partition_sched_domains_locked(), based on the value of "has_eas".
This would allow to let all the logic as it is right now, inside
build_perf_domains(), and then advertise sched_energy_aware sysctl
if EAS can run on the platform.
sched_energy_aware_sysctl_init() would be deleted then.


yes. that is true. and there is no variable which holds the info if the system
is capable of EAS.

Retrospecting, the reason for starting this patch series was this,
sysctl_sched_energy_aware didnt make sense on power10 platform since it
has SMT and symmetric CPU capacities. with current code writing 1 to
it cause rebuild of sched domains but EAS wouldn't be possible.


Possible Approaches:

1.
Make this sysctl write as NOP if the platform doesn't has EAS capabilities at
the moment. Do those checks in sched_energy_aware_handler before handling the
change in value. Return EINVAL. And Update sysctl description that on such
platforms value change is NOP. Relatively simpler change.

2.
Current patch approach, remove the sysctl completely on non supported
architectures and re-enable it if the system becomes capable of doing EAS.
With the current patch, instead of using sched_energy_update, use another
variable called sched_energy_change_in_sysctl(maybe different name). I think
that would handle all the cases. Another variable can be avoided by encoding
the info in sysctl_sched_energy_aware itself in the handler call, since it
takes only 1 or 0 as the value. upper bits are free to use. update the sysctl
as well with this behavior. plus minor cleanup to remove the init of sysctl.


FWIW I think it makes more sense to remove the sysctl if EAS isn't possible on
the platform, as this patch intends to do.
From a code perspective I m not sure I follow exactly your intend. I can test
your v3 if necessary,

Regards,
Pierre