Re: Fans at full speed after resume

From: Sonny Rao
Date: Wed May 15 2013 - 05:47:15 EST


On Tue, May 14, 2013 at 9:56 PM, Sonny Rao <sonnyrao@xxxxxxxxxxxx> wrote:
> On Tue, May 14, 2013 at 9:34 PM, Sonny Rao <sonnyrao@xxxxxxxxxxxx> wrote:
>> On Tue, May 14, 2013 at 9:29 PM, Zhang Rui <rui.zhang@xxxxxxxxx> wrote:
>>> On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
>>>> please
>>>>
>>>> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
>>>> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
>>>> > the kernel turns the system fans on to max speed after resuming from
>>>> > ram. Other people have noticed it as well, for example see
>>>> > https://bugzilla.redhat.com/show_bug.cgi?id=895276
>>>> >
>>>> please check if this is a duplicate of bug
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=56591
>>> or you can try 3.10-rc1 to see if the problem still exists or not.
>>
>> Ok, I patched in the fix from that bugzilla --
>> 928c5edbe6f7cb0d1c71bc2353d091bc5b114fe3
>> but I'm still seeing the issue, I'll try 3.10-rc1 next
>>
>
> 3.10-rc1 seems good
> 3.9.2 is okay, though fans do seem to be on more for a while after
> resume, it eventually turns off
> 3.8.13 seems to still be broken, with fans at maximum
>

So, I did a reverse bisect between 3.9 and 3.9.1 and found that the
commit you mentioned does indeed fix the problem on 3.9, and I
double-checked that it doesn't seem to be fixed on 3.8.13. So, I made
a 3.8.13 version of this debug patch in the bugzilla entry
https://bugzilla.kernel.org/attachment.cgi?id=98671

and I never see the thermal_cdev_update getting called for cdev 0 or
cdev 1, yet they are set to 1 after resume. Perhaps something else is
enabling them?

>>>
>>> thanks,
>>> rui
>>>> > For example on the Samsung 550 Chromebook, we have one thermal zone
>>>> > and have 5 cooling_devices, 0-4, which correspond to 5 possible fan
>>>> > speeds. Under typical idle, only cooling_device4 and maybe
>>>> > cooling_device3 are active, depending on temperature:
>>>> >
>>>> > cat /sys/class/thermal/cooling_device[01234]/cur_state
>>>> > /sys/class/thermal/thermal_zone0/temp
>>>> > 0
>>>> > 0
>>>> > 0
>>>> > 0
>>>> > 1
>>>> > 57000
>>>> >
>>>> > however after a suspend/resume, we see that cooling_devices 0 and 1
>>>> > become active:
>>>> > cat /sys/class/thermal/cooling_device[01234]/cur_state
>>>> > /sys/class/thermal/thermal_zone0/temp
>>>> > 1
>>>> > 1
>>>> > 0
>>>> > 0
>>>> > 1
>>>> > 54000
>>>> >
>>>> > and it seems to stay that way, even though the temperature is low
>>>> > enough that the fan shouldn't be running at that speed. If I manually
>>>> > disable cooling_devices 0 and 1 then fan control works normally again.
>>>> >
>>>> > I started bisecting it and was able to do so up until this commit:
>>>> > commit 29b19e250434c6193c8b8e4c34c9c6284dd4f101
>>>> > Merge: 125c4c7 c072fed
>>>> > Author: Len Brown <len.brown@xxxxxxxxx>
>>>> > AuthorDate: Tue Oct 9 01:35:52 2012 -0400
>>>> > Commit: Len Brown <len.brown@xxxxxxxxx>
>>>> > CommitDate: Tue Oct 9 01:35:52 2012 -0400
>>>> >
>>>> > Merge branch 'release' of
>>>> > git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into
>>>> > thermal
>>>> >
>>>> > unfortunately, I'm not able to successfully do a suspend/resume on the
>>>> > commits in that merge, so I wasn't able to bisect down to the exact
>>>> > commit.
>>>> >
>>>> > I did confirm that one parent of the merge is okay: commit
>>>> > 125c4c706b680c7831f0966ff873c1ad0354ec25 idr: rename MAX_LEVEL to
>>>> > MAX_IDR_LEVEL
>>>> >
>>>> > so I think it falls somewhere in this list of commits:
>>>> > c072fed95c9855a920c114d7fa3351f0f54ea06e...e3f25e6e5836c4790fbe395ff42e241f372d859d
>>>> >
>>>> > c072fed9 thermal: Exynos: Fix NULL pointer dereference in
>>>> > exynos_unregister_thermal()
>>>> > a4b6fec9 Thermal: Fix bug on cpu_cooling, cooling device's id conflict problem.
>>>> > 79e093c3 thermal: exynos: Use devm_* functions
>>>> > 17be868e ARM: exynos: add thermal sensor driver platform data support
>>>> > 7e0b55e6 thermal: exynos: register the tmu sensor with the kernel thermal layer
>>>> > f22d9c03c thermal: exynos5: add exynos5250 thermal sensor driver support
>>>> > c48cbba6 hwmon: exynos4: move thermal sensor driver to driver/thermal directory
>>>> > 02361418 thermal: add generic cpufreq cooling implementation
>>>> > a7a3b8c8 Fix a build error.
>>>> > 204dd1d3 thermal: Fix potential NULL pointer accesses
>>>> > 1e426ffdd thermal: add Renesas R-Car thermal sensor support
>>>> > 79a49168 thermal: fix potential out-of-bounds memory access
>>>> > f4a821ce6 Thermal: Introduce locking for cdev.thermal_instances list.
>>>> > 908b9fb79 Thermal: Unify the code for both active and passive cooling
>>>> > ce119f832 Thermal: Introduce simple arbitrator for setting device cooling state
>>>> > b5e4ae62 Thermal: List thermal_instance in thermal_cooling_device.
>>>> > cddf31b3b Thermal: Rename thermal_instance.node to thermal_instance.tz_node.
>>>> > 2d374139 Thermal: Rename thermal_zone_device.cooling_devices
>>>> > b81b6ba3 Thermal: rename structure thermal_cooling_device_instance to
>>>> > thermal_instance
>>>> > 4ae46befb Thermal: Introduce thermal_zone_trip_update()
>>>> > 1b7ddb84 Thermal: Remove tc1/tc2 in generic thermal layer.
>>>> > 601f3d424 Thermal: Introduce .get_trend() callback.
>>>> > 9d99842f9 Thermal: set upper and lower limits
>>>> > 74051ba5 Thermal: Introduce cooling states range support
>>>> >
>>>> > When I get time, I'll try to rebase those commits onto the IDR commit
>>>> > and see if I can get a better bisect. Any insights into the problem
>>>> > would be appreciated, thanks.
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/