Re: [PATCH v2 0/5] hwmon: k10temp driver improvements

From: Guenter Roeck
Date: Sun Jan 19 2020 - 10:49:38 EST


On 1/19/20 5:38 AM, Holger Kiehl wrote:
On Sat, 18 Jan 2020, Guenter Roeck wrote:

This patch series implements various improvements for the k10temp driver.

Patch 1/5 introduces the use of bit operations.

Patch 2/5 converts the driver to use the devm_hwmon_device_register_with_info
API. This not only simplifies the code and reduces its size, it also
makes the code easier to maintain and enhance.

Patch 3/5 adds support for reporting Core Complex Die (CCD) temperatures
on Ryzen 3 (Zen2) CPUs.

Patch 4/5 adds support for reporting core and SoC current and voltage
information on Ryzen CPUs.

Patch 5/5 removes the maximum temperature from Tdie for Ryzen CPUs.
It is inaccurate, misleading, and it just doesn't make sense to report
wrong information.

With all patches in place, output on Ryzen 3900X CPUs looks as follows
(with the system under load).

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: +1.36 V
Vsoc: +1.18 V
Tdie: +86.8°C
Tctl: +86.8°C
Tccd1: +80.0°C
Tccd2: +81.8°C
Icore: +44.14 A
Isoc: +13.83 A

The voltage and current information is limited to Ryzen CPUs. Voltage
and current reporting on Threadripper and EPYC CPUs is different, and the
reported information is either incomplete or wrong. Exclude it for the time
being; it can always be added if/when more information becomes available.

Tested with the following Ryzen CPUs:
1300X A user with this CPU in the system reported somewhat unexpected
values for Vcore; it isn't entirely if at all clear why that is
the case. Overall this does not warrant holding up the series.
1600
1800X
2200G
2400G
3800X
3900X
3950X

v2: Added tested-by: tags as received.
Don't display voltage and current information for Threadripper and EPYC.
Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
for Tdie on model 17h/18h CPUs.

Just tested this on a 2400G. Here idle values:

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: +0.77 V
Vsoc: +1.11 V
Tdie: +45.0°C
Tctl: +45.0°C
Icore: +10.39 A
Isoc: +2.89 A

nvme-pci-0100
Adapter: PCI adapter
Composite: +43.9°C (low = -273.1°C, high = +80.8°C)
(crit = +80.8°C)
Sensor 1: +43.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +48.9°C (low = -273.1°C, high = +65261.8°C)

nct6793-isa-0290
Adapter: ISA adapter
in0: +0.35 V (min = +0.00 V, max = +1.74 V)
in1: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in2: +3.41 V (min = +0.00 V, max = +0.00 V) ALARM
in3: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in4: +0.26 V (min = +0.00 V, max = +0.00 V) ALARM
in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in6: +0.66 V (min = +0.00 V, max = +0.00 V) ALARM
in7: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM
in9: +1.83 V (min = +0.00 V, max = +0.00 V) ALARM
in10: +0.19 V (min = +0.00 V, max = +0.00 V) ALARM
in11: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in12: +1.84 V (min = +0.00 V, max = +0.00 V) ALARM
in13: +1.72 V (min = +0.00 V, max = +0.00 V) ALARM
in14: +0.21 V (min = +0.00 V, max = +0.00 V) ALARM
fan1: 0 RPM (min = 0 RPM)
fan2: 323 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +112.0°C (high = +0.0°C, hyst = +0.0°C) sensor = thermistor
CPUTIN: +60.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
AUXTIN0: +46.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
AUXTIN1: +106.0°C sensor = thermistor
AUXTIN2: +105.0°C sensor = thermistor
AUXTIN3: +102.0°C sensor = thermistor
SMBUSMASTER 0: +45.0°C
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
intrusion0: OK
intrusion1: ALARM
beep_enable: disabled

amdgpu-pci-0300
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
edge: +45.0°C (crit = +80.0°C, hyst = +0.0°C)

And here with some high load:

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: +1.32 V
Vsoc: +1.11 V
Tdie: +77.1°C
Tctl: +77.1°C
Icore: +85.22 A
Isoc: +3.61 A

nvme-pci-0100
Adapter: PCI adapter
Composite: +42.9°C (low = -273.1°C, high = +80.8°C)
(crit = +80.8°C)
Sensor 1: +42.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +45.9°C (low = -273.1°C, high = +65261.8°C)

nct6793-isa-0290
Adapter: ISA adapter
in0: +0.68 V (min = +0.00 V, max = +1.74 V)
in1: +1.84 V (min = +0.00 V, max = +0.00 V) ALARM
in2: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in3: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in4: +0.26 V (min = +0.00 V, max = +0.00 V) ALARM
in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in6: +0.66 V (min = +0.00 V, max = +0.00 V) ALARM
in7: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM
in9: +1.83 V (min = +0.00 V, max = +0.00 V) ALARM
in10: +0.19 V (min = +0.00 V, max = +0.00 V) ALARM
in11: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in12: +1.84 V (min = +0.00 V, max = +0.00 V) ALARM
in13: +1.72 V (min = +0.00 V, max = +0.00 V) ALARM
in14: +0.20 V (min = +0.00 V, max = +0.00 V) ALARM
fan1: 0 RPM (min = 0 RPM)
fan2: 1931 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +113.0°C (high = +0.0°C, hyst = +0.0°C) sensor = thermistor
CPUTIN: +64.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
AUXTIN0: +45.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
AUXTIN1: +107.0°C sensor = thermistor
AUXTIN2: +105.0°C sensor = thermistor
AUXTIN3: +102.0°C sensor = thermistor
SMBUSMASTER 0: +77.0°C
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
intrusion0: OK
intrusion1: ALARM
beep_enable: disabled

amdgpu-pci-0300
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
edge: +77.0°C (crit = +80.0°C, hyst = +0.0°C)

Have also tried this on a EPYC 7302. Before the patch:

k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +28.1°C (high = +70.0°C)
Tctl: +28.1°C

and after:

k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +28.2°C
Tctl: +28.2°C

No extra values shown, but I think this is expected.

Unfortunately yes, but it helps to confirm that the detection works.

Tested-by Holger Kiehl <holger.kiehl@xxxxxx>


Thanks again!

Guenter