[RFC][PATCH] cpufreq: User/admin documentation update and consolidation

From: Rafael J. Wysocki
Date: Fri Feb 17 2017 - 20:42:10 EST


From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>

The user/admin documentation of cpufreq is badly outdated. It
conains stale and/or inaccurate information along with things
that are not particularly useful. Also, some of the important
pieces are missing from it.

For this reason, add a new user/admin document for cpufreq
containing current information to admin-guide and drop the old
outdated .txt documents it is replacing.

Since there will be more PM documents in admin-guide going forward,
create a separate directory for them and put the cpufreq document
in there right away.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
---

Something I've been working on for quite a while recently.

Comments welcome. In particular, please check if the information in the new
doc is as accurate as it should be (please note that it documents the current
state, so new code changes under discussion are not reflected by it).

Thanks,
Rafael

---
Documentation/admin-guide/index.rst | 1
Documentation/admin-guide/pm/cpufreq.rst | 699 +++++++++++++++++++++++++++++++
Documentation/admin-guide/pm/index.rst | 15
Documentation/cpu-freq/boost.txt | 93 ----
Documentation/cpu-freq/governors.txt | 301 -------------
Documentation/cpu-freq/user-guide.txt | 226 ----------
6 files changed, 715 insertions(+), 620 deletions(-)

Index: linux-pm/Documentation/admin-guide/pm/cpufreq.rst
===================================================================
--- /dev/null
+++ linux-pm/Documentation/admin-guide/pm/cpufreq.rst
@@ -0,0 +1,699 @@
+.. |struct| replace:: :c:type:`struct`
+
+=======================
+CPU Performance Scaling
+=======================
+
+::
+
+ Copyright (c) 2017 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
+
+The Concept of CPU Performance Scaling
+======================================
+
+The majority of modern processors are capable of operating in a number of
+different clock frequency and voltage configurations, often referred to as
+Operating Performance Points or P-states (in ACPI terminology). As a rule,
+the higher the clock frequency and the higher the voltage, the more instructions
+can be retired by the CPU over a unit of time, but also the higher the clock
+frequency and the higher the voltage, the more energy is consumed over a unit of
+time (or the more power is drawn) by the CPU in the given P-state. Therefore
+there is a natural tradeoff between the CPU capacity (the number of instructions
+that can be executed over a unit of time) and the power drawn by the CPU.
+
+In some situations it is desirable or even necessary to run the program as fast
+as possible and then there is no reason to use any P-states different from the
+highest one (i.e. the highest-performance frequency/voltage configuration
+available). In some other cases, however, it may not be necessary to execute
+instructions so quickly and maintaining the highest available CPU capacity for a
+relatively long time without utilizing it entirely may be regarded as wasteful.
+It also may not be physically possible to maintain maximum CPU capacity for too
+long for thermal or power supply capacity reasons or similar. To cover those
+cases, there are hardware interfaces allowing CPUs to be switched between
+different frequency/voltage configurations or (in the ACPI terminology) to be
+put into different P-states.
+
+Typically, they are used along with algorithms to estimate the required CPU
+capacity, so as to decide which P-states to put the CPUs into. Of course, since
+the utilization of the system generally changes over time, that has to be done
+repeatedly on a regular basis. The activity by which this happens is referred
+to as CPU performance scaling or CPU frequency scaling (because it involves
+adjusting the CPU clock frequency).
+
+
+CPU Performance Scaling in Linux
+================================
+
+The Linux kernel supports CPU performance scaling by means of the ``CPUFreq``
+(CPU Frequency scaling) subsystem that consists of three layers of code: the
+core, scaling governors and scaling drivers.
+
+The ``CPUFreq`` core provides the common code infrastructure and user space
+interfaces for all platforms that support CPU performance scaling. It defines
+the basic framework in which the other components operate.
+
+Scaling governors implement algorithms to estimate the required CPU capacity.
+As a rule, each governor implements one, possibly parametrized, scaling
+algorithm.
+
+Scaling drivers talk to the hardware. They provide scaling governors with
+information on the available P-states (or P-state ranges in some cases) and
+access platform-specific hardware interfaces to change CPU P-states as requested
+by scaling governors.
+
+In principle, all available scaling governors can be used with every scaling
+driver. That design is based on the observation that the information used by
+performance scaling algorithms for P-state selection can be represented in a
+platform-independent form in the majority of cases, so it should be possible
+to use the same performance scaling algorithm implemented in exactly the same
+way regardless of which scaling driver is used. Consequently, the same set of
+scaling governors should be suitable for every supported platform.
+
+However, that observation may not hold for performance scaling algorithms
+based on information provided by the hardware itself, for example through
+feedback registers, as that information is typically specific to the hardware
+interface it comes from and may not be easily represented in an abstract,
+platform-independent way. For this reason, ``CPUFreq`` allows scaling drivers
+to bypass the governor layer and implement their own performance scaling
+algorithms. That is done by the ``intel_pstate`` scaling driver.
+
+
+``CPUFreq`` Policy Objects
+==========================
+
+In some cases the hardware interface for P-state control is shared by multiple
+CPUs. That is, for example, the same register (or set of registers) is used to
+control the P-state of multiple CPUs at the same time and writing to it affects
+all of those CPUs simultaneously.
+
+Sets of CPUs sharing hardware P-state control interfaces are represented by
+``CPUFreq`` as |struct| :c:type:`cpufreq_policy` objects. For consistency,
+|struct| :c:type:`cpufreq_policy` is also used when there is only one CPU in the
+given set.
+
+The ``CPUFreq`` core maintains a pointer to a |struct| :c:type:`cpufreq_policy`
+object for every CPU in the system, including CPUs that are currently offline.
+If multiple CPUs share the same hardware P-state control interface, all of the
+pointers corresponding to them point to the same |struct|
+:c:type:`cpufreq_policy` object.
+
+``CPUFreq`` uses |struct| :c:type:`cpufreq_policy` as its basic data type and
+the design of its user space interface is based on the policy concept.
+
+
+CPU Initialization
+==================
+
+First of all, a scaling driver has to be registered for ``CPUFreq`` to work.
+It is only possible to register one scaling driver at a time, so the scaling
+driver is expected to be able to handle all CPUs in the system.
+
+The scaling driver may be registered before or after CPU registration. If
+CPUs are registered earlier, the driver core invokes the ``CPUFreq`` core to
+take a note of all of the already registered CPUs during the registration of the
+scaling driver. In turn, if any CPUs are registered after the registration of
+the scaling driver, the ``CPUFreq`` core will be invoked to take note of them
+at their registration time.
+
+In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it
+has not seen so far as soon as it is ready to handle that CPU. [Note that the
+logical CPU may be a physical single-core processor, or a single core in a
+multicore processor, or a hardware thread in a physical processor or processor
+core. In what follows "CPU" always means "logical CPU" unless explicitly stated
+otherwise and the word "processor" is used to refer to the physical part
+possibly including multiple logical CPUs.]
+
+Once invoked, the ``CPUFreq`` core checks if the policy pointer is already set
+for the given CPU and if so, it skips the policy object creation. Otherwise,
+a new policy object is created and initialized, which involves the creation of
+a new policy directory in ``sysfs``, and the policy pointer corresponding to
+the given CPU is set to the new policy object's address in memory.
+
+Next, the scaling driver's ``->init()`` callback is invoked with the policy
+pointer of the new CPU passed to it as the argument. If the policy object
+pointed to by it is new, that callback is expected to initialize the performance
+scaling hardware interface for the given CPU (or, more precisely, for the set of
+CPUs sharing the hardware interface it belongs to, represented by its policy
+object) and to set parameters of the policy, like the minimum and maximum
+frequencies supported by the hardware, the table of available frequencies (if
+the set of supported P-states is not a continuous range), and the mask of CPUs
+that belong to the same policy. That mask is then used by the core to populate
+the policy pointers for all of the CPUs in it.
+
+The next major initialization step for a new policy object is to attach a
+scaling governor to it (to begin with, that is the default scaling governor
+determined by the kernel configuration, but it may be changed later
+via ``sysfs``). First, a pointer to the new policy object is passed to the
+governor's ``->init()`` callback which is expected to initialize all of the
+data structures necessary to handle the given policy and, possibly, to add
+a governor ``sysfs`` interface to it. Next, the governor is started by
+invoking its ``->start()`` callback.
+
+That callback it expected to register per-CPU utilization update callbacks for
+all of the online CPUs belonging to the given policy with the CPU scheduler.
+The utilization update callbacks will be invoked by the CPU scheduler on
+important events, like task enqueue and dequeue, on every iteration of the
+scheduler tick or generally whenever the CPU utilization may change (from the
+scheduler's perspective). They are expected to carry out computations needed
+to determine the P-state to use for the given policy going forward and to
+invoke the scaling driver to make changes to the hardware in accordance with
+the P-state selection. The scaling driver may be invoked directly from
+scheduler context or asynchronously, via a kernel thread or workqueue, depending
+on the configuration and capabilities of the scaling driver and the governor.
+
+Similar steps are taken for policy objects that are not new, but were "inactive"
+previously, meaning that all of the CPUs belonging to them were offline. The
+only practical difference in that case is that the ``CPUFreq`` core will attempt
+to use the scaling governor previously used with the policy that became
+"inactive" (and is re-initialized now) instead of the default governor.
+
+In turn, if a previously offline CPU is being brought back online, but some
+other CPUs sharing the policy object with it are online already, there is no
+need to re-initialize the policy object at all. In that case, it only is
+necessary to restart the scaling governor so that it can take the new online CPU
+into account. That is achieved by invoking the governor's ``->stop`` and
+``->start()`` callbacks, in this order, for the entire policy.
+
+As mentioned before, the ``intel_pstate`` scaling driver bypasses the scaling
+governor layer of ``CPUFreq`` and provides its own P-state selection algorithms.
+Consequently, if ``intel_pstate`` is used, scaling governors are not attached to
+new policy objects. Instead, the driver's ``->setpolicy()`` callback is invoked
+to register per-CPU utilization update callbacks for each policy. These
+callbacks are invoked by the CPU scheduler in the same way as for scaling
+governors, but in the ``intel_pstate`` case they both determine the P-state to
+use and change the hardware configuration accordingly in one go from scheduler
+context.
+
+The policy objects created during CPU initialization and other data structures
+associated with them are torn down when the scaling driver is unregistered
+(which happens when the kernel module containing it is unloaded, for example) or
+when the last CPU belonging to the given policy in unregistered.
+
+
+Policy Interface in ``sysfs``
+=============================
+
+During the initialization of the kernel, the ``CPUFreq`` core creates a
+``sysfs`` directory (kobject) called ``cpufreq`` under
+:file:`/sys/devices/system/cpu/`.
+
+That directory contains a ``policyX`` subdirectory (where ``X`` represents an
+integer number) for every policy object maintained by the ``CPUFreq`` core.
+Each ``policyX`` directory is pointed to by ``cpufreq`` symbolic links
+under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer
+that may be different from the one represented by ``X``) for all of the CPUs
+associated with (or belonging to) the given policy. The ``policyX`` directories
+in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific
+attributes (files) to control ``CPUFreq`` behavior for the corresponding policy
+objects (that is, for all of the CPUs associated with them).
+
+Some of those attributes are generic. They are created by the ``CPUFreq`` core
+and their behavior generally does not depend on what scaling driver is in use
+and what scaling governor is attached to the given policy. Some scaling drivers
+also add driver-specific attributes to the policy directories in ``sysfs`` to
+control policy-specific aspects of driver behavior.
+
+The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/`
+are the following:
+
+``affected_cpus``
+ List of online CPUs belonging to this policy (i.e. sharing the hardware
+ performance scaling interface represented by the ``policyX`` policy
+ object).
+
+``bios_limit``
+ If the platform firmware (BIOS) tells the OS to apply an upper limit to
+ CPU frequencies, that limit will be reported through this attribute (if
+ present).
+
+ The existence of the limit may be a result of some (often unintentional)
+ BIOS settings, restrictions coming from a service processor or another
+ BIOS/HW-based mechanisms.
+
+ This does not cover ACPI thermal limitations which can be discovered
+ through a generic thermal driver.
+
+ This attribute is not present if the scaling driver in use does not
+ support it.
+
+``cpuinfo_max_freq``
+ Maximum possible operating frequency the CPUs belonging to this policy
+ can run at (in kHz).
+
+``cpuinfo_min_freq``
+ Minimum possible operating frequency the CPUs belonging to this policy
+ can run at (in kHz).
+
+``cpuinfo_transition_latency``
+ The time it takes to switch the CPUs belonging to this policy from one
+ P-state to another, in nanoseconds.
+
+ If unknown or if known to be so high that the scaling driver does not
+ work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`)
+ will be returned by reads from this attribute.
+
+``related_cpus``
+ List of all (online and offline) CPUs belonging to this policy.
+
+``scaling_available_governors``
+ List of ``CPUFreq`` scaling governors present in the kernel that can
+ be attached to this policy or (if the ``intel_pstate`` scaling driver is
+ in use) list of scaling algorithms provided by the driver that can be
+ applied to this policy.
+
+ [Note that some governors are modular and it may be necessary to load a
+ kernel module for the governor held by it to become available and be
+ listed by this attribute.]
+
+``scaling_cur_freq``
+ Current frequency of all of the CPUs belonging to this policy (in kHz).
+
+ For the majority of scaling drivers, this is the frequency of the last
+ P-state requested by the driver from the hardware using the scaling
+ interface provided by it, which may or may not reflect the frequency
+ the CPU is actually running at (due to hardware design and other
+ limitations).
+
+ Some scaling drivers (e.g. ``intel_pstate``) attempt to provide
+ information more precisely reflecting the current CPU frequency through
+ this attribute, but that still may not be the exact current CPU
+ frequency as seen by the hardware at the moment.
+
+``scaling_driver``
+ The scaling driver currently in use.
+
+``scaling_governor``
+ The scaling governor currently attached to this policy or (if the
+ ``intel_pstate`` scaling driver is in use) the scaling algorithm
+ provided by the driver that is currently applied to this policy.
+
+ This attribute is read-write and writing to it will cause a new scaling
+ governor to be attached to this policy or a new scaling algorithm
+ provided by the scaling driver to be applied to it (in the
+ ``intel_pstate`` case), as indicated by the string written to this
+ attribute (which must be one of the names listed by the
+ ``scaling_available_governors`` attribute described above).
+
+``scaling_max_freq``
+ Maximum frequency the CPUs belonging to this policy are allowed to be
+ running at (in kHz).
+
+ This attribute is read-write and writing a string representing an
+ integer to it will cause a new limit to be set (it must not be lower
+ than the value of the ``scaling_min_freq`` attribute).
+
+``scaling_min_freq``
+ Minimum frequency the CPUs belonging to this policy are allowed to be
+ running at (in kHz).
+
+ This attribute is read-write and writing a string representing a
+ non-negative integer to it will cause a new limit to be set (it must not
+ be higher than the value of the ``scaling_max_freq`` attribute).
+
+``scaling_setspeed``
+ This attribute is functional only if the `userspace`_ scaling governor
+ is attached to the given policy.
+
+ It returns the last frequency requested by the governor (in kHz) or can
+ be written to in order to set a new frequency for the policy.
+
+
+Generic Scaling Governors
+=========================
+
+``CPUFreq`` provides generic scaling governors that can be used with all
+scaling drivers. As stated before, each of them implements a single, possibly
+parametrized, performance scaling algorithm.
+
+Scaling governors are attached to policy objects and different policy objects
+can be handled by different scaling governors at the same time (although that
+may lead to suboptimal results in some cases).
+
+The scaling governor for a given policy object can be changed at any time with
+the help of the ``scaling_governor`` policy attribute in ``sysfs``.
+
+Some governors expose ``sysfs`` attributes to control or fine-tune the scaling
+algorithms implemented by them. Those attributes, referred to as governor
+tunables, can be either global (system-wide) or per-policy, depending on the
+scaling driver in use. If the driver requires governor tunables to be
+per-policy, they are located in a subdirectory of each policy directory.
+Otherwise, they are located in a subdirectory under
+:file:`/sys/devices/system/cpu/cpufreq/`. In either case the name of the
+subdirectory containing the governor tunables is the name of the governor
+providing them.
+
+``performance``
+---------------
+
+When attached to a policy object, this governor causes the highest frequency,
+within the ``scaling_max_freq`` policy limit, to be requested for that policy.
+
+The request is made once at that time the governor for the policy is set to
+``performance`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
+policy limits change after that.
+
+``powersave``
+-------------
+
+When attached to a policy object, this governor causes the lowest frequency,
+within the ``scaling_min_freq`` policy limit, to be requested for that policy.
+
+The request is made once at that time the governor for the policy is set to
+``powersave`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
+policy limits change after that.
+
+``userspace``
+-------------
+
+This governor does not do anything by itself. Instead, it allows user space
+to set the CPU frequency for the policy it is attached to by writing to the
+``scaling_setspeed`` attribute of that policy.
+
+``schedutil``
+-------------
+
+This governor uses CPU utilization data available from the CPU scheduler. It
+generally is regarded as a part of the CPU scheduler, so it can access the
+scheduler's internal data structures directly.
+
+It runs entirely in scheduler context, although in some cases it may need to
+invoke the scaling driver asynchronously when it decides that the CPU frequency
+should be changed for a given policy (that depends on whether or not the driver
+is capable of changing the CPU frequency from scheduler context).
+
+The actions of this governor for a particular CPU depend on the scheduling class
+invoking its utilization update callback for that CPU. If it is invoked by the
+RT or deadline scheduling classes, the governor will increase the frequency to
+the allowed maximum (that is, the ``scaling_max_freq`` policy limit). In turn,
+if it is invoked by the CFS scheduling class, the governor will use the
+Per-Entity Load Tracking (PELT) metric for the root control group of the
+given CPU as the CPU utilization estimate (see the `Per-entity load tracking`_
+LWN.net article for a description of the PELT mechanism). Then, the new
+CPU frequency to apply is computed in accordance with the formula
+
+ f = 1.25 * f_0 * util / max
+
+where util is the PELT number, max is the theoretical maximum of util, and f_0
+is either the maximum possible CPU frequency for the given policy (if the PELT
+number is frequency-invariant), or the current frequency of the CPU (otherwise).
+
+This governor also employs a mechanism allowing it to temporarily bump up the
+CPU frequency for tasks that have been waiting on I/O most recently, called
+"IO-wait boosting". That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag
+is passed by the scheduler to the governor callback which causes the frequency
+to go up to the allowed maximum immediately and then draw back to the value
+returned by the above formula over time.
+
+This governor exposes only one tunable:
+
+``rate_limit_us``
+ Minimum time (in microseconds) that has to pass between two consecutive
+ runs of governor computations (default: 1000 times the scaling driver's
+ transition latency).
+
+ Its purpose is to reduce the scheduler context overhead of this
+ governor which might be excessive without it.
+
+It generally is regarded as a replacement for the older `ondemand`_ and
+`conservative`_ governors (described below), as it is simpler and more tightly
+integrated with the CPU scheduler, its overhead in terms of CPU context switches
+and similar is less significant, and it uses the scheduler's own CPU utilization
+metric, so in principle its decisions should not contradict the decisions made
+by the other parts of the scheduler.
+
+``ondemand``
+------------
+
+This governor uses CPU load as a CPU frequency selection metric.
+
+In order to estimate the current CPU load, it measures the time elapsed between
+consecutive invocations of its worker routine and computes the fraction of that
+time in which the given CPU was not idle. The ratio of the non-idle (active)
+time to the total CPU time is taken as an estimate of the load.
+
+If this governor is attached to a policy shared by multiple CPUs, the load is
+estimated for all of them and the greatest result is taken as the load estimate
+for the entire policy.
+
+The worker routine of this governor has to run in process context, so it is
+invoked asynchronously (via a workqueue) and CPU P-states are updated from
+there if necessary. As a result, the scheduler context overhead from this
+governor is minimum, but it causes additional CPU context switches to happen
+relatively often and the CPU P-state updates triggered by it can be relatively
+irregular. Also, it affects its own CPU load metric by running code that
+reduces the CPU idle time (even though the CPU idle time is only reduced very
+slightly by it).
+
+It generally selects CPU frequencies proportional to the estimated load, so that
+the value of the ``cpuinfo_max_freq`` policy attribute corresponds to the load of
+1 (or 100%), and the value of the ``cpuinfo_min_freq`` policy attribute
+corresponds to the load of 0, unless when the load exceeds a (configurable)
+speedup threshold, in which case it will go straight for the highest frequency
+it is allowed to use (the ``scaling_max_freq`` policy limit).
+
+This governor exposes the following tunables:
+
+``sampling_rate``
+ This is how often the governor's worker routine should run, in
+ microseconds.
+
+ Typically, it is set to values of the order of 10000 (10 ms). Its
+ default value is equal to the value of ``cpuinfo_transition_latency``
+ for each policy this governor is attached to (but since the unit here
+ is greater by 1000, this means that the time represented by
+ ``sampling_rate`` is 1000 times greater than the transition latency by
+ default).
+
+ If this tunable is per-policy, the following shell command sets the time
+ represented by it to be 750 times as high as the transition latency::
+
+ # echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
+
+
+``min_sampling_rate``
+ The minimum value of ``sampling_rate``.
+
+ Equal to 10000 (10 ms) if :c:macro:`CONFIG_NO_HZ_COMMON` and
+ :c:data:`tick_nohz_active` are both set or to 20 times the value of
+ :c:data:`jiffies` in microseconds otherwise.
+
+``up_threshold``
+ If the estimated CPU load is above this value (in percent), the governor
+ will set the frequency to the maximum value allowed for the policy.
+ Otherwise, the selected frequency will be proportional to the estimated
+ CPU load.
+
+``ignore_nice_load``
+ If set to 1 (default 0), it will cause the CPU load estimation code to
+ treat the CPU time spent on executing tasks with "nice" levels greater
+ than 0 as CPU idle time.
+
+ This may be useful if there are tasks in the system that should not be
+ taken into account when deciding what frequency to run the CPUs at.
+ Then, to make that happen it is sufficient to increase the "nice" level
+ of those tasks above 0 and set this attribute to 1.
+
+``sampling_down_factor``
+ Temporary multiplier, between 1 (default) and 100 inclusive, to apply to
+ the ``sampling_rate`` value if the CPU load goes above ``up_threshold``.
+
+ This causes the next execution of the governor's worker routine (after
+ setting the frequency to the allowed maximum) to be delayed, so the
+ frequency stays at the maximum level for a longer time.
+
+ Frequency fluctuations in some bursty workloads may be avoided this way
+ at the cost of additional energy spent on maintaining the maximum CPU
+ capacity.
+
+``powersave_bias``
+ Reduction factor to apply to the original frequency target of the
+ governor (including the maximum value used when the ``up_threshold``
+ value is exceeded by the estimated CPU load) or sensitivity threshold
+ for the AMD frequency sensitivity powersave bias driver
+ (:file:`drivers/cpufreq/amd_freq_sensitivity.c`), between 0 and 1000
+ inclusive.
+
+ If the AMD frequency sensitivity powersave bias driver is not loaded,
+ the effective frequency to apply is given by
+
+ f * (1 - ``powersave_bias`` / 1000)
+
+ where f is the governor's original frequency target. The default value
+ of this attribute is 0 in that case.
+
+ If the AMD frequency sensitivity powersave bias driver is loaded, the
+ value of this attribute is 400 by default and it is used in a different
+ way.
+
+ On Family 16h (and later) AMD processors there is a mechanism to get a
+ measured workload sensitivity, between 0 and 100% inclusive, from the
+ hardware. That value can be used to estimate how the performance of the
+ workload running on a CPU will change in response to frequency changes.
+
+ The performance of a workload with the sensitivity of 0 (memory-bound or
+ IO-bound) is not expected to increase at all as a result of increasing
+ the CPU frequency, whereas workloads with the sensitivity of 100%
+ (CPU-bound) are expected to perform much better if the CPU frequency is
+ increased.
+
+ If the workload sensitivity is less than the threshold represented by
+ the ``powersave_bias`` value, the sensitivity powersave bias driver
+ will cause the governor to select a frequency lower than its original
+ target, so as to avoid over-provisioning workloads that will not benefit
+ from running at higher CPU frequencies.
+
+``conservative``
+----------------
+
+This governor uses CPU load as a CPU frequency selection metric.
+
+It estimates the CPU load in the same way as the `ondemand`_ governor described
+above, but the CPU frequency selection algorithm implemented by it is different.
+
+Namely, it avoids changing the frequency significantly over short time intervals
+which may not be suitable for systems with limited power supply capacity (e.g.
+battery-powered). To achieve that, it changes the frequency in relatively
+small steps, one step at a time, up or down - depending on whether or not a
+(configurable) threshold has been exceeded by the estimated CPU load.
+
+This governor exposes the following tunables:
+
+``freq_step``
+ Frequency step in percent of the maximum frequency the governor is
+ allowed to set (the ``scaling_max_freq`` policy limit), between 0 and
+ 100 (5 by default).
+
+ This is how much the frequency is allowed to change in one go. Setting
+ it to 0 will cause the default frequency step (5 percent) to be used
+ and setting it to 100 effectively causes the governor to periodically
+ switch the frequency between the ``scaling_min_freq`` and
+ ``scaling_max_freq`` policy limits.
+
+``down_threshold``
+ Threshold value (in percent, 20 by default) used to determine the
+ frequency change direction.
+
+ If the estimated CPU load is greater than this value, the frequency will
+ go up (by ``freq_step``). If the load is less than this value (and the
+ ``sampling_down_factor`` mechanism is not in effect), the frequency will
+ go down. Otherwise, the frequency will not be changed.
+
+``sampling_down_factor``
+ Frequency decrease deferral factor, between 1 (default) and 10
+ inclusive.
+
+ It effectively causes the frequency to go down ``sampling_down_factor``
+ times slower than it ramps up.
+
+
+Frequency Boost Support
+=======================
+
+Background
+----------
+
+Some processors support a mechanism to raise the operating frequency of some
+cores in a multicore package temporarily (and above the sustainable frequency
+threshold for the whole package) under certain conditions, for example if the
+whole chip is not fully utilized and below its intended thermal or power budget.
+
+Different names are used by different vendors to refer to this functionality.
+For Intel processors it is referred to as "Turbo Boost", AMD calls it
+"Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on.
+As a rule, it also is implemented differently by different vendors. The simple
+term "frequency boost" is used here for brevity to refer to all of those
+implementations.
+
+The frequency boost mechanism may be either hardware-based or software-based.
+If it is hardware-based (e.g. on x86), the decision to trigger the boosting is
+made by the hardware (although in general it requires the hardware to be put
+into a special state in which it can control the CPU frequency within certain
+limits). If it is software-based (e.g. on ARM), the scaling driver decides
+whether or not to trigger boosting and when to do that.
+
+The ``boost`` File in ``sysfs``
+-------------------------------
+
+This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls
+the "boost" setting for the whole system. It is not present if the underlying
+scaling driver does not support the frequency boost mechanism (or supports it,
+but provides a driver-specific interface for controlling it, like
+``intel_pstate``).
+
+If the value in this file is 1, the frequency boost mechanism is enabled. This
+means that either the hardware can be put into states in which it is able to
+trigger boosting (in the hardware-based case), or the software is allowed to
+trigger boosting (in the software-based case). It does not mean that boosting
+is actually in use at the moment on any CPUs in the system. It only means a
+permission to use the frequency boost mechanism (which still may never be used
+for other reasons).
+
+If the value in this file is 0, the frequency boost mechanism is disabled and
+cannot be used at all.
+
+The only values that can be written to this file are 0 and 1.
+
+Rationale for Boost Control Knob
+--------------------------------
+
+The frequency boost mechanism is generally intended to help to achieve optimum
+CPU performance on time scales below software resolution (e.g. below the
+scheduler tick interval) and it is demonstrably suitable for many workloads, but
+it may lead to problems in certain situations.
+
+For this reason, many systems make it possible to disable the frequency boost
+mechanism in the platform firmware (BIOS) setup, but that requires the system to
+be restarted for the setting to be adjusted as desired, which may not be
+practical at least in some cases. For example:
+
+ 1. Boosting means overclocking the processor, although under controlled
+ conditions. Generally, the processor's energy consumption increases
+ as a result of increasing its frequency and voltage, even temporarily.
+ That may not be desirable on systems that switch to power sources of
+ limited capacity, such as batteries, so the ability to disable the boost
+ mechanism while the system is running may help there (but that depends on
+ the workload too).
+
+ 2. In some situations deterministic behavior is more important than
+ performance or energy consumption (or both) and the ability to disable
+ boosting while the system is running may be useful then.
+
+ 3. To examine the impact of the frequency boost mechanism itself, it is useful
+ to be able to run tests with and without boosting, preferably without
+ restarting the system in the meantime.
+
+ 4. Reproducible results are important when running benchmarks. Since
+ the boosting functionality depends on the load of the whole package,
+ single-thread performance may vary because of it which may lead to
+ unreproducible results sometimes. That can be avoided by disabling the
+ frequency boost mechanism before running benchmarks sensitive to that
+ issue.
+
+Legacy AMD ``cpb`` Knob
+-----------------------
+
+The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to
+the global ``boost`` one. It is used for disabling/enabling the "Core
+Performance Boost" feature of some AMD processors.
+
+If present, that knob is located in every ``CPUFreq`` policy directory in
+``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called
+``cpb``, which indicates a more fine grained control interface. The actual
+implementation, however, works on the system-wide basis and setting that knob
+for one policy causes the same value of it to be set for all of the other
+policies at the same time.
+
+That knob is still supported on AMD processors that support its underlying
+hardware feature, but it may be configured out of the kernel (via the
+:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option) and the global
+``boost`` knob is present regardless. Thus it is always possible use the
+``boost`` knob instead of the ``cpb`` one which is highly recommended, as that
+is more consistent with what all of the other systems do (and the ``cpb`` knob
+may not be supported any more in the future).
+
+The ``cpb`` knob is never present for any processors without the underlying
+hardware feature (e.g. all Intel ones), even if the
+:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option is set.
+
+
+.. _Per-entity load tracking: https://lwn.net/Articles/531853/
Index: linux-pm/Documentation/admin-guide/pm/index.rst
===================================================================
--- /dev/null
+++ linux-pm/Documentation/admin-guide/pm/index.rst
@@ -0,0 +1,15 @@
+================
+Power Management
+================
+
+.. toctree::
+ :maxdepth: 1
+
+ cpufreq
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
Index: linux-pm/Documentation/admin-guide/index.rst
===================================================================
--- linux-pm.orig/Documentation/admin-guide/index.rst
+++ linux-pm/Documentation/admin-guide/index.rst
@@ -60,6 +60,7 @@ configure specific aspects of kernel beh
mono
java
ras
+ pm/index

.. only:: subproject and html

Index: linux-pm/Documentation/cpu-freq/boost.txt
===================================================================
--- linux-pm.orig/Documentation/cpu-freq/boost.txt
+++ /dev/null
@@ -1,93 +0,0 @@
-Processor boosting control
-
- - information for users -
-
-Quick guide for the impatient:
---------------------
-/sys/devices/system/cpu/cpufreq/boost
-controls the boost setting for the whole system. You can read and write
-that file with either "0" (boosting disabled) or "1" (boosting allowed).
-Reading or writing 1 does not mean that the system is boosting at this
-very moment, but only that the CPU _may_ raise the frequency at it's
-discretion.
---------------------
-
-Introduction
--------------
-Some CPUs support a functionality to raise the operating frequency of
-some cores in a multi-core package if certain conditions apply, mostly
-if the whole chip is not fully utilized and below it's intended thermal
-budget. The decision about boost disable/enable is made either at hardware
-(e.g. x86) or software (e.g ARM).
-On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core",
-in technical documentation "Core performance boost". In Linux we use
-the term "boost" for convenience.
-
-Rationale for disable switch
-----------------------------
-
-Though the idea is to just give better performance without any user
-intervention, sometimes the need arises to disable this functionality.
-Most systems offer a switch in the (BIOS) firmware to disable the
-functionality at all, but a more fine-grained and dynamic control would
-be desirable:
-1. While running benchmarks, reproducible results are important. Since
- the boosting functionality depends on the load of the whole package,
- single thread performance can vary. By explicitly disabling the boost
- functionality at least for the benchmark's run-time the system will run
- at a fixed frequency and results are reproducible again.
-2. To examine the impact of the boosting functionality it is helpful
- to do tests with and without boosting.
-3. Boosting means overclocking the processor, though under controlled
- conditions. By raising the frequency and the voltage the processor
- will consume more power than without the boosting, which may be
- undesirable for instance for mobile users. Disabling boosting may
- save power here, though this depends on the workload.
-
-
-User controlled switch
-----------------------
-
-To allow the user to toggle the boosting functionality, the cpufreq core
-driver exports a sysfs knob to enable or disable it. There is a file:
-/sys/devices/system/cpu/cpufreq/boost
-which can either read "0" (boosting disabled) or "1" (boosting enabled).
-The file is exported only when cpufreq driver supports boosting.
-Explicitly changing the permissions and writing to that file anyway will
-return EINVAL.
-
-On supported CPUs one can write either a "0" or a "1" into this file.
-This will either disable the boost functionality on all cores in the
-whole system (0) or will allow the software or hardware to boost at will
-(1).
-
-Writing a "1" does not explicitly boost the system, but just allows the
-CPU to boost at their discretion. Some implementations take external
-factors like the chip's temperature into account, so boosting once does
-not necessarily mean that it will occur every time even using the exact
-same software setup.
-
-
-AMD legacy cpb switch
----------------------
-The AMD powernow-k8 driver used to support a very similar switch to
-disable or enable the "Core Performance Boost" feature of some AMD CPUs.
-This switch was instantiated in each CPU's cpufreq directory
-(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb".
-Though the per CPU existence hints at a more fine grained control, the
-actual implementation only supported a system-global switch semantics,
-which was simply reflected into each CPU's file. Writing a 0 or 1 into it
-would pull the other CPUs to the same state.
-For compatibility reasons this file and its behavior is still supported
-on AMD CPUs, though it is now protected by a config switch
-(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created,
-even with the config option set.
-This functionality is considered legacy and will be removed in some future
-kernel version.
-
-More fine grained boosting control
-----------------------------------
-
-Technically it is possible to switch the boosting functionality at least
-on a per package basis, for some CPUs even per core. Currently the driver
-does not support it, but this may be implemented in the future.
Index: linux-pm/Documentation/cpu-freq/governors.txt
===================================================================
--- linux-pm.orig/Documentation/cpu-freq/governors.txt
+++ /dev/null
@@ -1,301 +0,0 @@
- CPU frequency and voltage scaling code in the Linux(TM) kernel
-
-
- L i n u x C P U F r e q
-
- C P U F r e q G o v e r n o r s
-
- - information for users and developers -
-
-
- Dominik Brodowski <linux@xxxxxxxx>
- some additions and corrections by Nico Golde <nico@xxxxxxxxx>
- Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
- Viresh Kumar <viresh.kumar@xxxxxxxxxx>
-
-
-
- Clock scaling allows you to change the clock speed of the CPUs on the
- fly. This is a nice method to save battery power, because the lower
- the clock speed, the less power the CPU consumes.
-
-
-Contents:
----------
-1. What is a CPUFreq Governor?
-
-2. Governors In the Linux Kernel
-2.1 Performance
-2.2 Powersave
-2.3 Userspace
-2.4 Ondemand
-2.5 Conservative
-2.6 Schedutil
-
-3. The Governor Interface in the CPUfreq Core
-
-4. References
-
-
-1. What Is A CPUFreq Governor?
-==============================
-
-Most cpufreq drivers (except the intel_pstate and longrun) or even most
-cpu frequency scaling algorithms only allow the CPU frequency to be set
-to predefined fixed values. In order to offer dynamic frequency
-scaling, the cpufreq core must be able to tell these drivers of a
-"target frequency". So these specific drivers will be transformed to
-offer a "->target/target_index/fast_switch()" call instead of the
-"->setpolicy()" call. For set_policy drivers, all stays the same,
-though.
-
-How to decide what frequency within the CPUfreq policy should be used?
-That's done using "cpufreq governors".
-
-Basically, it's the following flow graph:
-
-CPU can be set to switch independently | CPU can only be set
- within specific "limits" | to specific frequencies
-
- "CPUfreq policy"
- consists of frequency limits (policy->{min,max})
- and CPUfreq governor to be used
- / \
- / \
- / the cpufreq governor decides
- / (dynamically or statically)
- / what target_freq to set within
- / the limits of policy->{min,max}
- / \
- / \
- Using the ->setpolicy call, Using the ->target/target_index/fast_switch call,
- the limits and the the frequency closest
- "policy" is set. to target_freq is set.
- It is assured that it
- is within policy->{min,max}
-
-
-2. Governors In the Linux Kernel
-================================
-
-2.1 Performance
----------------
-
-The CPUfreq governor "performance" sets the CPU statically to the
-highest frequency within the borders of scaling_min_freq and
-scaling_max_freq.
-
-
-2.2 Powersave
--------------
-
-The CPUfreq governor "powersave" sets the CPU statically to the
-lowest frequency within the borders of scaling_min_freq and
-scaling_max_freq.
-
-
-2.3 Userspace
--------------
-
-The CPUfreq governor "userspace" allows the user, or any userspace
-program running with UID "root", to set the CPU to a specific frequency
-by making a sysfs file "scaling_setspeed" available in the CPU-device
-directory.
-
-
-2.4 Ondemand
-------------
-
-The CPUfreq governor "ondemand" sets the CPU frequency depending on the
-current system load. Load estimation is triggered by the scheduler
-through the update_util_data->func hook; when triggered, cpufreq checks
-the CPU-usage statistics over the last period and the governor sets the
-CPU accordingly. The CPU must have the capability to switch the
-frequency very quickly.
-
-Sysfs files:
-
-* sampling_rate:
-
- Measured in uS (10^-6 seconds), this is how often you want the kernel
- to look at the CPU usage and to make decisions on what to do about the
- frequency. Typically this is set to values of around '10000' or more.
- It's default value is (cmp. with users-guide.txt): transition_latency
- * 1000. Be aware that transition latency is in ns and sampling_rate
- is in us, so you get the same sysfs value by default. Sampling rate
- should always get adjusted considering the transition latency to set
- the sampling rate 750 times as high as the transition latency in the
- bash (as said, 1000 is default), do:
-
- $ echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
-
-* sampling_rate_min:
-
- The sampling rate is limited by the HW transition latency:
- transition_latency * 100
-
- Or by kernel restrictions:
- - If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed.
- - If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is
- used, the limits depend on the CONFIG_HZ option:
- HZ=1000: min=20000us (20ms)
- HZ=250: min=80000us (80ms)
- HZ=100: min=200000us (200ms)
-
- The highest value of kernel and HW latency restrictions is shown and
- used as the minimum sampling rate.
-
-* up_threshold:
-
- This defines what the average CPU usage between the samplings of
- 'sampling_rate' needs to be for the kernel to make a decision on
- whether it should increase the frequency. For example when it is set
- to its default value of '95' it means that between the checking
- intervals the CPU needs to be on average more than 95% in use to then
- decide that the CPU frequency needs to be increased.
-
-* ignore_nice_load:
-
- This parameter takes a value of '0' or '1'. When set to '0' (its
- default), all processes are counted towards the 'cpu utilisation'
- value. When set to '1', the processes that are run with a 'nice'
- value will not count (and thus be ignored) in the overall usage
- calculation. This is useful if you are running a CPU intensive
- calculation on your laptop that you do not care how long it takes to
- complete as you can 'nice' it and prevent it from taking part in the
- deciding process of whether to increase your CPU frequency.
-
-* sampling_down_factor:
-
- This parameter controls the rate at which the kernel makes a decision
- on when to decrease the frequency while running at top speed. When set
- to 1 (the default) decisions to reevaluate load are made at the same
- interval regardless of current clock speed. But when set to greater
- than 1 (e.g. 100) it acts as a multiplier for the scheduling interval
- for reevaluating load when the CPU is at its top speed due to high
- load. This improves performance by reducing the overhead of load
- evaluation and helping the CPU stay at its top speed when truly busy,
- rather than shifting back and forth in speed. This tunable has no
- effect on behavior at lower speeds/lower CPU loads.
-
-* powersave_bias:
-
- This parameter takes a value between 0 to 1000. It defines the
- percentage (times 10) value of the target frequency that will be
- shaved off of the target. For example, when set to 100 -- 10%, when
- ondemand governor would have targeted 1000 MHz, it will target
- 1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0
- (disabled) by default.
-
- When AMD frequency sensitivity powersave bias driver --
- drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter
- defines the workload frequency sensitivity threshold in which a lower
- frequency is chosen instead of ondemand governor's original target.
- The frequency sensitivity is a hardware reported (on AMD Family 16h
- Processors and above) value between 0 to 100% that tells software how
- the performance of the workload running on a CPU will change when
- frequency changes. A workload with sensitivity of 0% (memory/IO-bound)
- will not perform any better on higher core frequency, whereas a
- workload with sensitivity of 100% (CPU-bound) will perform better
- higher the frequency. When the driver is loaded, this is set to 400 by
- default -- for CPUs running workloads with sensitivity value below
- 40%, a lower frequency is chosen. Unloading the driver or writing 0
- will disable this feature.
-
-
-2.5 Conservative
-----------------
-
-The CPUfreq governor "conservative", much like the "ondemand"
-governor, sets the CPU frequency depending on the current usage. It
-differs in behaviour in that it gracefully increases and decreases the
-CPU speed rather than jumping to max speed the moment there is any load
-on the CPU. This behaviour is more suitable in a battery powered
-environment. The governor is tweaked in the same manner as the
-"ondemand" governor through sysfs with the addition of:
-
-* freq_step:
-
- This describes what percentage steps the cpu freq should be increased
- and decreased smoothly by. By default the cpu frequency will increase
- in 5% chunks of your maximum cpu frequency. You can change this value
- to anywhere between 0 and 100 where '0' will effectively lock your CPU
- at a speed regardless of its load whilst '100' will, in theory, make
- it behave identically to the "ondemand" governor.
-
-* down_threshold:
-
- Same as the 'up_threshold' found for the "ondemand" governor but for
- the opposite direction. For example when set to its default value of
- '20' it means that if the CPU usage needs to be below 20% between
- samples to have the frequency decreased.
-
-* sampling_down_factor:
-
- Similar functionality as in "ondemand" governor. But in
- "conservative", it controls the rate at which the kernel makes a
- decision on when to decrease the frequency while running in any speed.
- Load for frequency increase is still evaluated every sampling rate.
-
-
-2.6 Schedutil
--------------
-
-The "schedutil" governor aims at better integration with the Linux
-kernel scheduler. Load estimation is achieved through the scheduler's
-Per-Entity Load Tracking (PELT) mechanism, which also provides
-information about the recent load [1]. This governor currently does
-load based DVFS only for tasks managed by CFS. RT and DL scheduler tasks
-are always run at the highest frequency. Unlike all the other
-governors, the code is located under the kernel/sched/ directory.
-
-Sysfs files:
-
-* rate_limit_us:
-
- This contains a value in microseconds. The governor waits for
- rate_limit_us time before reevaluating the load again, after it has
- evaluated the load once.
-
-For an in-depth comparison with the other governors refer to [2].
-
-
-3. The Governor Interface in the CPUfreq Core
-=============================================
-
-A new governor must register itself with the CPUfreq core using
-"cpufreq_register_governor". The struct cpufreq_governor, which has to
-be passed to that function, must contain the following values:
-
-governor->name - A unique name for this governor.
-governor->owner - .THIS_MODULE for the governor module (if appropriate).
-
-plus a set of hooks to the functions implementing the governor's logic.
-
-The CPUfreq governor may call the CPU processor driver using one of
-these two functions:
-
-int cpufreq_driver_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation);
-
-int __cpufreq_driver_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation);
-
-target_freq must be within policy->min and policy->max, of course.
-What's the difference between these two functions? When your governor is
-in a direct code path of a call to governor callbacks, like
-governor->start(), the policy->rwsem is still held in the cpufreq core,
-and there's no need to lock it again (in fact, this would cause a
-deadlock). So use __cpufreq_driver_target only in these cases. In all
-other cases (for example, when there's a "daemonized" function that
-wakes up every second), use cpufreq_driver_target to take policy->rwsem
-before the command is passed to the cpufreq driver.
-
-4. References
-=============
-
-[1] Per-entity load tracking: https://lwn.net/Articles/531853/
-[2] Improvements in CPU frequency management: https://lwn.net/Articles/682391/
-
Index: linux-pm/Documentation/cpu-freq/user-guide.txt
===================================================================
--- linux-pm.orig/Documentation/cpu-freq/user-guide.txt
+++ /dev/null
@@ -1,226 +0,0 @@
- CPU frequency and voltage scaling code in the Linux(TM) kernel
-
-
- L i n u x C P U F r e q
-
- U S E R G U I D E
-
-
- Dominik Brodowski <linux@xxxxxxxx>
-
-
-
- Clock scaling allows you to change the clock speed of the CPUs on the
- fly. This is a nice method to save battery power, because the lower
- the clock speed, the less power the CPU consumes.
-
-
-Contents:
----------
-1. Supported Architectures and Processors
-1.1 ARM and ARM64
-1.2 x86
-1.3 sparc64
-1.4 ppc
-1.5 SuperH
-1.6 Blackfin
-
-2. "Policy" / "Governor"?
-2.1 Policy
-2.2 Governor
-
-3. How to change the CPU cpufreq policy and/or speed
-3.1 Preferred interface: sysfs
-
-
-
-1. Supported Architectures and Processors
-=========================================
-
-1.1 ARM and ARM64
------------------
-
-Almost all ARM and ARM64 platforms support CPU frequency scaling.
-
-1.2 x86
--------
-
-The following processors for the x86 architecture are supported by cpufreq:
-
-AMD Elan - SC400, SC410
-AMD mobile K6-2+
-AMD mobile K6-3+
-AMD mobile Duron
-AMD mobile Athlon
-AMD Opteron
-AMD Athlon 64
-Cyrix Media GXm
-Intel mobile PIII and Intel mobile PIII-M on certain chipsets
-Intel Pentium 4, Intel Xeon
-Intel Pentium M (Centrino)
-National Semiconductors Geode GX
-Transmeta Crusoe
-Transmeta Efficeon
-VIA Cyrix 3 / C3
-various processors on some ACPI 2.0-compatible systems [*]
-And many more
-
-[*] Only if "ACPI Processor Performance States" are available
-to the ACPI<->BIOS interface.
-
-
-1.3 sparc64
------------
-
-The following processors for the sparc64 architecture are supported by
-cpufreq:
-
-UltraSPARC-III
-
-
-1.4 ppc
--------
-
-Several "PowerBook" and "iBook2" notebooks are supported.
-
-
-1.5 SuperH
-----------
-
-All SuperH processors supporting rate rounding through the clock
-framework are supported by cpufreq.
-
-1.6 Blackfin
-------------
-
-The following Blackfin processors are supported by cpufreq:
-
-BF522, BF523, BF524, BF525, BF526, BF527, Rev 0.1 or higher
-BF531, BF532, BF533, Rev 0.3 or higher
-BF534, BF536, BF537, Rev 0.2 or higher
-BF561, Rev 0.3 or higher
-BF542, BF544, BF547, BF548, BF549, Rev 0.1 or higher
-
-
-2. "Policy" / "Governor" ?
-==========================
-
-Some CPU frequency scaling-capable processor switch between various
-frequencies and operating voltages "on the fly" without any kernel or
-user involvement. This guarantees very fast switching to a frequency
-which is high enough to serve the user's needs, but low enough to save
-power.
-
-
-2.1 Policy
-----------
-
-On these systems, all you can do is select the lower and upper
-frequency limit as well as whether you want more aggressive
-power-saving or more instantly available processing power.
-
-
-2.2 Governor
-------------
-
-On all other cpufreq implementations, these boundaries still need to
-be set. Then, a "governor" must be selected. Such a "governor" decides
-what speed the processor shall run within the boundaries. One such
-"governor" is the "userspace" governor. This one allows the user - or
-a yet-to-implement userspace program - to decide what specific speed
-the processor shall run at.
-
-
-3. How to change the CPU cpufreq policy and/or speed
-====================================================
-
-3.1 Preferred Interface: sysfs
-------------------------------
-
-The preferred interface is located in the sysfs filesystem. If you
-mounted it at /sys, the cpufreq interface is located in a subdirectory
-"cpufreq" within the cpu-device directory
-(e.g. /sys/devices/system/cpu/cpu0/cpufreq/ for the first CPU).
-
-affected_cpus : List of Online CPUs that require software
- coordination of frequency.
-
-cpuinfo_cur_freq : Current frequency of the CPU as obtained from
- the hardware, in KHz. This is the frequency
- the CPU actually runs at.
-
-cpuinfo_min_freq : this file shows the minimum operating
- frequency the processor can run at(in kHz)
-
-cpuinfo_max_freq : this file shows the maximum operating
- frequency the processor can run at(in kHz)
-
-cpuinfo_transition_latency The time it takes on this CPU to
- switch between two frequencies in nano
- seconds. If unknown or known to be
- that high that the driver does not
- work with the ondemand governor, -1
- (CPUFREQ_ETERNAL) will be returned.
- Using this information can be useful
- to choose an appropriate polling
- frequency for a kernel governor or
- userspace daemon. Make sure to not
- switch the frequency too often
- resulting in performance loss.
-
-related_cpus : List of Online + Offline CPUs that need software
- coordination of frequency.
-
-scaling_available_frequencies : List of available frequencies, in KHz.
-
-scaling_available_governors : this file shows the CPUfreq governors
- available in this kernel. You can see the
- currently activated governor in
-
-scaling_cur_freq : Current frequency of the CPU as determined by
- the governor and cpufreq core, in KHz. This is
- the frequency the kernel thinks the CPU runs
- at.
-
-scaling_driver : this file shows what cpufreq driver is
- used to set the frequency on this CPU
-
-scaling_governor, and by "echoing" the name of another
- governor you can change it. Please note
- that some governors won't load - they only
- work on some specific architectures or
- processors.
-
-scaling_min_freq and
-scaling_max_freq show the current "policy limits" (in
- kHz). By echoing new values into these
- files, you can change these limits.
- NOTE: when setting a policy you need to
- first set scaling_max_freq, then
- scaling_min_freq.
-
-scaling_setspeed This can be read to get the currently programmed
- value by the governor. This can be written to
- change the current frequency for a group of
- CPUs, represented by a policy. This is supported
- currently only by the userspace governor.
-
-bios_limit : If the BIOS tells the OS to limit a CPU to
- lower frequencies, the user can read out the
- maximum available frequency from this file.
- This typically can happen through (often not
- intended) BIOS settings, restrictions
- triggered through a service processor or other
- BIOS/HW based implementations.
- This does not cover thermal ACPI limitations
- which can be detected through the generic
- thermal driver.
-
-If you have selected the "userspace" governor which allows you to
-set the CPU operating frequency to a specific value, you can read out
-the current frequency in
-
-scaling_setspeed. By "echoing" a new frequency into this
- you can change the speed of the CPU,
- but only within the limits of
- scaling_min_freq and scaling_max_freq.