Re: [RFT PATCH 0/4] hwmon: k10temp driver improvements

From: Ondrej Äerman
Date: Fri Jan 17 2020 - 17:48:53 EST



DÅa 17. 1. 2020 o 19:46 Guenter Roeck napÃsal(a):
On Fri, Jan 17, 2020 at 10:46:25AM +0100, Ondrej Äerman wrote:
DÅa 16. 1. 2020 o 15:17 Guenter Roeck napÃsal(a):
This patch series implements various improvements for the k10temp driver.

Patch 1/4 introduces the use of bit operations.

Patch 2/4 converts the driver to use the devm_hwmon_device_register_with_info
API. This not only simplifies the code and reduces its size, it also
makes the code easier to maintain and enhance.

Patch 3/4 adds support for reporting Core Complex Die (CCD) temperatures
on Ryzen 3 (Zen2) CPUs.

Patch 4/4 adds support for reporting core and SoC current and voltage
information on Ryzen CPUs.

With all patches in place, output on Ryzen 3900 CPUs looks as follows
(with the system under load).

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: +1.36 V
Vsoc: +1.18 V
Tdie: +86.8ÂC (high = +70.0ÂC)
Tctl: +86.8ÂC
Tccd1: +80.0ÂC
Tccd2: +81.8ÂC
Icore: +44.14 A
Isoc: +13.83 A

The patch series has only been tested with Ryzen 3900 CPUs. Further test
coverage will be necessary before the changes can be applied to the Linux
kernel.

Hello everyone, I am the author of https://github.com/ocerman/zenpower/ .

It is nice to see this merged.

I just want to warn you that there have been reported issues with
Threadripper CPUs to zenpower issue tracker. Also I think that no-one tested
EPYC CPUs.

Most of the stuff I was able to figure out by trial-and-error approach and
unfortunately because I do not own any Threadripper CPU I was not able to
test and fix reported problems.

Thanks a lot for the note. The key problem seems to be that Threadripper
doesn't report SoC current and voltage. Is that correct ? If so, that
should be easy to solve.

Hello,

I thought that initially, but I was wrong. It seems like that these multi-node CPUs are reporting SOC and Core voltage/current data at particular node. Look at this HWiNFO64 screenshot of 2990WX for reference: https://i.imgur.com/yM9X5nd.jpg . They also may be using different addresses and/or factors.

On a side note, drivers/gpu/drm/amd/include/asic_reg/thm/thm_10_0_offset.h
suggests that two more temperature sensors might be available at 0x0005995C
and 0x00059960 (DIE3_TEMP and SW_TEMP). Have you ever tried that ?

Thanks,
Guenter

I was aware of 0005995c and I thought that it could be Tdie3 (that's why I have included it in debug output, someone already shared that 3960X is reporting data on that address). I think this one can be safely included.

I was not aware of the other address, I will try it.

Ondrej.