Re: [PATCH v6 1/6] thermal: add generic cpufreq cooling implementation

From: Amit Kachhap
Date: Fri Aug 17 2012 - 03:58:34 EST


On 17 August 2012 12:54, Zhang Rui <rui.zhang@xxxxxxxxx> wrote:
> On å, 2012-08-16 at 17:11 +0530, Amit Daniel Kachhap wrote:
>> This patchset introduces a new generic cooling device based on cpufreq
>> that can be used on non-ACPI platforms. As a proof of concept, we have
>> drivers for the following platforms using this mechanism now:
>>
>> * Samsung Exynos (Exynos4 and Exynos5) in the current patchset.
>> * Freescale i.MX (git://git.linaro.org/people/amitdanielk/linux.git imx6q_thermal)
>>
>> There is a small change in cpufreq cooling registration APIs, so a minor
>> change is needed for Freescale platforms.
>>
>> Brief Description:
>>
>> 1) The generic cooling devices code is placed inside driver/thermal/*
>> as placing inside acpi folder will need un-necessary enabling of acpi
>> code. This code is architecture independent.
>>
>> 2) This patchset adds generic cpu cooling low level implementation
>> through frequency clipping. In future, other cpu related cooling
>> devices may be added here. An ACPI version of this already exists
>> (drivers/acpi/processor_thermal.c) .But this will be useful for
>> platforms like ARM using the generic thermal interface along with the
>> generic cpu cooling devices. The cooling device registration API's
>> return cooling device pointers which can be easily binded with the
>> thermal zone trip points. The important APIs exposed are,
>>
>> a) struct thermal_cooling_device *cpufreq_cooling_register(
>> struct cpumask *clip_cpus)
>> b) void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>>
>> 3) Samsung exynos platform thermal implementation is done using the
>> generic cpu cooling APIs and the new trip type. The temperature sensor
>> driver present in the hwmon folder(registered as hwmon driver) is moved
>> to thermal folder and registered as a thermal driver.
>>
>> A simple data/control flow diagrams is shown below,
>>
>> Core Linux thermal <-----> Exynos thermal interface <----- Temperature Sensor
>> | |
>> \|/ |
>> Cpufreq cooling device <---------------
>>
>> TODO:
>> *Will send the DT enablement patches later after the driver is merged.
>>
>> This patch:
>>
>> Add support for generic cpu thermal cooling low level implementations
>> using frequency scaling up/down based on the registration parameters.
>> Different cpu related cooling devices can be registered by the user and
>> the binding of these cooling devices to the corresponding trip points can
>> be easily done as the registration APIs return the cooling device pointer.
>> The user of these APIs are responsible for passing clipping frequency .
>> The drivers can also register to recieve notification about any cooling
>> action called.
>>
>> [akpm@xxxxxxxxxxxxxxxxxxxx: fix comment layout]
>> Signed-off-by: Amit Daniel Kachhap <amit.kachhap@xxxxxxxxxx>
>> Cc: Guenter Roeck <guenter.roeck@xxxxxxxxxxxx>
>> Cc: SangWook Ju <sw.ju@xxxxxxxxxxx>
>> Cc: Durgadoss <durgadoss.r@xxxxxxxxx>
>> Cc: Len Brown <lenb@xxxxxxxxxx>
>> Cc: Jean Delvare <khali@xxxxxxxxxxxx>
>> Cc: Kyungmin Park <kmpark@xxxxxxxxxxxxx>
>> Cc: Kukjin Kim <kgene.kim@xxxxxxxxxxx>
>> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Amit Daniel Kachhap <amit.daniel@xxxxxxxxxxx>
>> ---
>> Documentation/thermal/cpu-cooling-api.txt | 52 +++
>> drivers/thermal/Kconfig | 11 +
>> drivers/thermal/Makefile | 1 +
>> drivers/thermal/cpu_cooling.c | 512 +++++++++++++++++++++++++++++
>> include/linux/cpu_cooling.h | 79 +++++
>> 5 files changed, 655 insertions(+), 0 deletions(-)
>> create mode 100644 Documentation/thermal/cpu-cooling-api.txt
>> create mode 100644 drivers/thermal/cpu_cooling.c
>> create mode 100644 include/linux/cpu_cooling.h
>>
>> diff --git a/Documentation/thermal/cpu-cooling-api.txt b/Documentation/thermal/cpu-cooling-api.txt
>> new file mode 100644
>> index 0000000..a1f2a6b
>> --- /dev/null
>> +++ b/Documentation/thermal/cpu-cooling-api.txt
>> @@ -0,0 +1,52 @@
>> +CPU cooling APIs How To
>> +===================================
>> +
>> +Written by Amit Daniel Kachhap <amit.kachhap@xxxxxxxxxx>
>> +
>> +Updated: 12 May 2012
>> +
>> +Copyright (c) 2012 Samsung Electronics Co., Ltd(http://www.samsung.com)
>> +
>> +0. Introduction
>> +
>> +The generic cpu cooling(freq clipping) provides registration/unregistration APIs
>> +to the caller. The binding of the cooling devices to the trip point is left for
>> +the user. The registration APIs returns the cooling device pointer.
>> +
>> +1. cpu cooling APIs
>> +
>> +1.1 cpufreq registration/unregistration APIs
>> +1.1.1 struct thermal_cooling_device *cpufreq_cooling_register(
>> + struct cpumask *clip_cpus)
>> +
>> + This interface function registers the cpufreq cooling device with the name
>> + "thermal-cpufreq-%x". This api can support multiple instances of cpufreq
>> + cooling devices.
>> +
>> + clip_cpus: cpumask of cpus where the frequency constraints will happen.
>> +
>> +1.1.2 void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>> +
>> + This interface function unregisters the "thermal-cpufreq-%x" cooling device.
>> +
>> + cdev: Cooling device pointer which has to be unregistered.
>> +
>> +
>> +1.2 CPU cooling action notifier register/unregister interface
>> +1.2.1 int cputherm_register_notifier(struct notifier_block *nb,
>> + unsigned int list)
>> +
>> + This interface registers a driver with cpu cooling layer. The driver will
>> + be notified when any cpu cooling action is called.
>> +
>> + nb: notifier function to register
>> + list: CPUFREQ_COOLING_START or CPUFREQ_COOLING_STOP
>> +
>> +1.2.2 int cputherm_unregister_notifier(struct notifier_block *nb,
>> + unsigned int list)
>> +
>> + This interface registers a driver with cpu cooling layer. The driver will
>> + be notified when any cpu cooling action is called.
>> +
>> + nb: notifier function to register
>> + list: CPUFREQ_COOLING_START or CPUFREQ_COOLING_STOP
>
> what are these two APIs used for?
> I did not see they are used in your patch set, do I miss something?
No currently they are not used by my patches. I added them on request
from Eduardo and others
>
>> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
>> index 7dd8c34..996003b 100644
>> --- a/drivers/thermal/Kconfig
>> +++ b/drivers/thermal/Kconfig
>> @@ -19,6 +19,17 @@ config THERMAL_HWMON
>> depends on HWMON=y || HWMON=THERMAL
>> default y
>>
>> +config CPU_THERMAL
>> + bool "generic cpu cooling support"
>> + depends on THERMAL && CPU_FREQ
>> + help
>> + This implements the generic cpu cooling mechanism through frequency
>> + reduction, cpu hotplug and any other ways of reducing temperature. An
>> + ACPI version of this already exists(drivers/acpi/processor_thermal.c).
>> + This will be useful for platforms using the generic thermal interface
>> + and not the ACPI interface.
>> + If you want this support, you should say Y here.
>> +
>> config SPEAR_THERMAL
>> bool "SPEAr thermal sensor driver"
>> depends on THERMAL
>> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
>> index fd9369a..aae59ad 100644
>> --- a/drivers/thermal/Makefile
>> +++ b/drivers/thermal/Makefile
>> @@ -3,5 +3,6 @@
>> #
>>
>> obj-$(CONFIG_THERMAL) += thermal_sys.o
>> +obj-$(CONFIG_CPU_THERMAL) += cpu_cooling.o
>> obj-$(CONFIG_SPEAR_THERMAL) += spear_thermal.o
>> obj-$(CONFIG_RCAR_THERMAL) += rcar_thermal.o
>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
>> new file mode 100644
>> index 0000000..c42e557
>> --- /dev/null
>> +++ b/drivers/thermal/cpu_cooling.c
>> @@ -0,0 +1,512 @@
>> +/*
>> + * linux/drivers/thermal/cpu_cooling.c
>> + *
>> + * Copyright (C) 2012 Samsung Electronics Co., Ltd(http://www.samsung.com)
>> + * Copyright (C) 2012 Amit Daniel <amit.kachhap@xxxxxxxxxx>
>> + *
>> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; version 2 of the License.
>> + *
>> + * This program is distributed in the hope that it will be useful, but
>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, write to the Free Software Foundation, Inc.,
>> + * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
>> + *
>> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/thermal.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/cpufreq.h>
>> +#include <linux/err.h>
>> +#include <linux/slab.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpu_cooling.h>
>> +
>> +/**
>> + * struct cpufreq_cooling_device
>> + * @id: unique integer value corresponding to each cpufreq_cooling_device
>> + * registered.
>> + * @cool_dev: thermal_cooling_device pointer to keep track of the the
>> + * egistered cooling device.
>> + * @cpufreq_state: integer value representing the current state of cpufreq
>> + * cooling devices.
>> + * @cpufreq_val: integer value representing the absolute value of the clipped
>> + * frequency.
>> + * @allowed_cpus: all the cpus involved for this cpufreq_cooling_device.
>> + * @node: list_head to link all cpufreq_cooling_device together.
>> + *
>> + * This structure is required for keeping information of each
>> + * cpufreq_cooling_device registered as a list whose head is represented by
>> + * cooling_cpufreq_list. In order to prevent corruption of this list a
>> + * mutex lock cooling_cpufreq_lock is used.
>> + */
>> +struct cpufreq_cooling_device {
>> + int id;
>> + struct thermal_cooling_device *cool_dev;
>> + unsigned int cpufreq_state;
>> + unsigned int cpufreq_val;
>> + struct cpumask allowed_cpus;
>> + struct list_head node;
>> +};
>> +static LIST_HEAD(cooling_cpufreq_list);
>> +static DEFINE_IDR(cpufreq_idr);
>> +
>> +static struct mutex cooling_cpufreq_lock;
>> +
>> +/* notify_table passes value to the CPUFREQ_ADJUST callback function. */
>> +#define NOTIFY_INVALID NULL
>> +struct cpufreq_cooling_device *notify_device;
>> +
>> +/* Head of the blocking notifier chain to inform about frequency clamping */
>> +static BLOCKING_NOTIFIER_HEAD(cputherm_state_notifier_list);
>> +
>> +/**
>> + * get_idr - function to get a unique id.
>> + * @idr: struct idr * handle used to create a id.
>> + * @id: int * value generated by this function.
>> + */
>> +static int get_idr(struct idr *idr, int *id)
>> +{
>> + int err;
>> +again:
>> + if (unlikely(idr_pre_get(idr, GFP_KERNEL) == 0))
>> + return -ENOMEM;
>> +
>> + mutex_lock(&cooling_cpufreq_lock);
>> + err = idr_get_new(idr, NULL, id);
>> + mutex_unlock(&cooling_cpufreq_lock);
>> +
>> + if (unlikely(err == -EAGAIN))
>> + goto again;
>> + else if (unlikely(err))
>> + return err;
>> +
>> + *id = *id & MAX_ID_MASK;
>> + return 0;
>> +}
>> +
>> +/**
>> + * release_idr - function to free the unique id.
>> + * @idr: struct idr * handle used for creating the id.
>> + * @id: int value representing the unique id.
>> + */
>> +static void release_idr(struct idr *idr, int id)
>> +{
>> + mutex_lock(&cooling_cpufreq_lock);
>> + idr_remove(idr, id);
>> + mutex_unlock(&cooling_cpufreq_lock);
>> +}
>> +
>> +/**
>> + * cputherm_register_notifier - Register a notifier with cpu cooling interface.
>> + * @nb: struct notifier_block * with callback info.
>> + * @list: integer value for which notification is needed. possible values are
>> + * CPUFREQ_COOLING_START and CPUFREQ_COOLING_STOP.
>> + *
>> + * This exported function registers a driver with cpu cooling layer. The driver
>> + * will be notified when any cpu cooling action is called.
>> + */
>> +int cputherm_register_notifier(struct notifier_block *nb, unsigned int list)
>> +{
>> + int ret = 0;
>> +
>> + switch (list) {
>> + case CPUFREQ_COOLING_START:
>> + case CPUFREQ_COOLING_STOP:
>> + ret = blocking_notifier_chain_register(
>> + &cputherm_state_notifier_list, nb);
>> + break;
>> + default:
>> + ret = -EINVAL;
>> + }
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(cputherm_register_notifier);
>> +
>> +/**
>> + * cputherm_unregister_notifier - Un-register a notifier.
>> + * @nb: struct notifier_block * with callback info.
>> + * @list: integer value for which notification is needed. values possible are
>> + * CPUFREQ_COOLING_START or CPUFREQ_COOLING_STOP.
>> + *
>> + * This exported function un-registers a driver with cpu cooling layer.
>> + */
>> +int cputherm_unregister_notifier(struct notifier_block *nb, unsigned int list)
>> +{
>> + int ret = 0;
>> +
>> + switch (list) {
>> + case CPUFREQ_COOLING_START:
>> + case CPUFREQ_COOLING_STOP:
>> + ret = blocking_notifier_chain_unregister(
>> + &cputherm_state_notifier_list, nb);
>> + break;
>> + default:
>> + ret = -EINVAL;
>> + }
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(cputherm_unregister_notifier);
>> +
>> +/* Below code defines functions to be used for cpufreq as cooling device */
>> +
>> +/**
>> + * is_cpufreq_valid - function to check if a cpu has frequency transition policy.
>> + * @cpu: cpu for which check is needed.
>> + */
>> +static int is_cpufreq_valid(int cpu)
>> +{
>> + struct cpufreq_policy policy;
>> + return !cpufreq_get_policy(&policy, cpu);
>> +}
>> +
>> +/**
>> + * get_cpu_frequency - get the absolute value of frequency from level.
>> + * @cpu: cpu for which frequency is fetched.
>> + * @level: level of frequency of the CPU
>> + * e.g level=1 --> 1st MAX FREQ, LEVEL=2 ---> 2nd MAX FREQ, .... etc
>> + */
>> +static unsigned int get_cpu_frequency(unsigned int cpu, unsigned long level)
>> +{
>> + int ret = 0, i = 0;
>> + unsigned long level_index;
>> + bool descend = false;
>> + struct cpufreq_frequency_table *table =
>> + cpufreq_frequency_get_table(cpu);
>> + if (!table)
>> + return ret;
>> +
>> + while (table[i].frequency != CPUFREQ_TABLE_END) {
>> + if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
>> + continue;
>> +
>> + /*check if table in ascending or descending order*/
>> + if ((table[i + 1].frequency != CPUFREQ_TABLE_END) &&
>> + (table[i + 1].frequency < table[i].frequency)
>> + && !descend) {
>> + descend = true;
>> + }
>> +
>> + /*return if level matched and table in descending order*/
>> + if (descend && i == level)
>> + return table[i].frequency;
>> + i++;
>> + }
>> + i--;
>> +
>> + if (level > i || descend)
>> + return ret;
>> + level_index = i - level;
>> +
>> + /*Scan the table in reverse order and match the level*/
>> + while (i >= 0) {
>> + if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
>> + continue;
>> + /*return if level matched*/
>> + if (i == level_index)
>> + return table[i].frequency;
>> + i--;
>> + }
>> + return ret;
>> +}
>> +
>> +/**
>> + * cpufreq_apply_cooling - function to apply frequency clipping.
>> + * @cpufreq_device: cpufreq_cooling_device pointer containing frequency
>> + * clipping data.
>> + * @cooling_state: value of the cooling state.
>> + */
>> +static int cpufreq_apply_cooling(struct cpufreq_cooling_device *cpufreq_device,
>> + unsigned long cooling_state)
>> +{
>> + unsigned int event, cpuid, clip_freq;
>> + struct cpumask *maskPtr = &cpufreq_device->allowed_cpus;
>> + unsigned int cpu = cpumask_any(maskPtr);
>> +
>> +
>> + /* Check if the old cooling action is same as new cooling action */
>> + if (cpufreq_device->cpufreq_state == cooling_state)
>> + return 0;
>> +
>> + clip_freq = get_cpu_frequency(cpu, cooling_state);
>> + if (!clip_freq)
>> + return -EINVAL;
>> +
>> + cpufreq_device->cpufreq_state = cooling_state;
>> + cpufreq_device->cpufreq_val = clip_freq;
>> + notify_device = cpufreq_device;
>> +
>> + if (cooling_state != 0)
>> + event = CPUFREQ_COOLING_START;
>> + else
>> + event = CPUFREQ_COOLING_STOP;
>> +
>> + blocking_notifier_call_chain(&cputherm_state_notifier_list,
>> + event, &clip_freq);
>> +
>> + for_each_cpu(cpuid, maskPtr) {
>> + if (is_cpufreq_valid(cpuid))
>> + cpufreq_update_policy(cpuid);
>> + }
>> +
>> + notify_device = NOTIFY_INVALID;
>> +
>> + return 0;
>> +}
>> +
>> +/**
>> + * cpufreq_thermal_notifier - notifier callback for cpufreq policy change.
>> + * @nb: struct notifier_block * with callback info.
>> + * @event: value showing cpufreq event for which this function invoked.
>> + * @data: callback-specific data
>> + */
>> +static int cpufreq_thermal_notifier(struct notifier_block *nb,
>> + unsigned long event, void *data)
>> +{
>> + struct cpufreq_policy *policy = data;
>> + unsigned long max_freq = 0;
>> +
>> + if (event != CPUFREQ_ADJUST || notify_device == NOTIFY_INVALID)
>> + return 0;
>> +
>> + if (cpumask_test_cpu(policy->cpu, &notify_device->allowed_cpus))
>> + max_freq = notify_device->cpufreq_val;
>> +
>> + /* Never exceed user_policy.max*/
>> + if (max_freq > policy->user_policy.max)
>> + max_freq = policy->user_policy.max;
>> +
>> + if (policy->max != max_freq)
>> + cpufreq_verify_within_limits(policy, 0, max_freq);
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * cpufreq cooling device callback functions are defined below
>> + */
>> +
>> +/**
>> + * cpufreq_get_max_state - callback function to get the max cooling state.
>> + * @cdev: thermal cooling device pointer.
>> + * @state: fill this variable with the max cooling state.
>> + */
>> +static int cpufreq_get_max_state(struct thermal_cooling_device *cdev,
>> + unsigned long *state)
>> +{
>> + int ret = -EINVAL, i = 0;
>> + struct cpufreq_cooling_device *cpufreq_device;
>> + struct cpumask *maskPtr;
>> + unsigned int cpu;
>> + struct cpufreq_frequency_table *table;
>> +
>> + mutex_lock(&cooling_cpufreq_lock);
>> + list_for_each_entry(cpufreq_device, &cooling_cpufreq_list, node) {
>> + if (cpufreq_device && cpufreq_device->cool_dev == cdev)
>> + break;
>> + }
>> + if (cpufreq_device == NULL)
>> + goto return_get_max_state;
>> +
>> + maskPtr = &cpufreq_device->allowed_cpus;
>> + cpu = cpumask_any(maskPtr);
>> + table = cpufreq_frequency_get_table(cpu);
>> + if (!table) {
>> + *state = 0;
>> + ret = 0;
>> + goto return_get_max_state;
>> + }
>> +
>> + while (table[i].frequency != CPUFREQ_TABLE_END) {
>> + if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
>> + continue;
>> + i++;
>> + }
>> + if (i > 0) {
>> + *state = --i;
>> + ret = 0;
>> + }
>> +
>> +return_get_max_state:
>> + mutex_unlock(&cooling_cpufreq_lock);
>> + return ret;
>> +}
>> +
>> +/**
>> + * cpufreq_get_cur_state - callback function to get the current cooling state.
>> + * @cdev: thermal cooling device pointer.
>> + * @state: fill this variable with the current cooling state.
>> + */
>> +static int cpufreq_get_cur_state(struct thermal_cooling_device *cdev,
>> + unsigned long *state)
>> +{
>> + int ret = -EINVAL;
>> + struct cpufreq_cooling_device *cpufreq_device;
>> +
>> + mutex_lock(&cooling_cpufreq_lock);
>> + list_for_each_entry(cpufreq_device, &cooling_cpufreq_list, node) {
>> + if (cpufreq_device && cpufreq_device->cool_dev == cdev) {
>> + *state = cpufreq_device->cpufreq_state;
>> + ret = 0;
>> + break;
>> + }
>> + }
>> + mutex_unlock(&cooling_cpufreq_lock);
>> +
>
> as cpufreq may be changed in other places, e.g. via sysfs I/F, we should
> use the current cpu frequency to get the REAL cooling state, rather than
> using a cached value.

Yes agreed , I will repost with your suggestion.

Thanks,
Amit
>
> thanks,
> rui
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/