Re: [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome

From: Quan Nguyen
Date: Thu Feb 16 2023 - 21:36:38 EST




On 16/02/2023 19:30, Greg Kroah-Hartman wrote:
On Thu, Feb 16, 2023 at 10:22:14AM +0700, Quan Nguyen wrote:


On 15/02/2023 14:33, Paul Menzel wrote:
Dear Quan,


Thank you for your patch.


Thanks Paul for the review.

Am 14.02.23 um 07:45 schrieb Quan Nguyen:
Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
to BMC when DIMM training failed.

Where you able to verify that it works? Out of curiosity, how?


Yes, we verified it by injecting DIMM errors and confirm that errors was
reported correctly via sysfs.
For about how to do error injection, we may need to refer to section 3.2
Memory Error Group in Altra Family RAS Error Injection User Manual. It is
shared in our Ampere Customer Connect [1]. The latest version is
v1.00_20220329.

[1] https://connect.amperecomputing.com

Signed-off-by: Quan Nguyen <quan@xxxxxxxxxxxxxxxxxxxxxx>
---
  .../sysfs-bus-platform-devices-ampere-smpro   | 10 +++
  drivers/misc/smpro-errmon.c                   | 77 +++++++++++++++++++
  2 files changed, 87 insertions(+)

diff --git
a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
index d4e3f308c451..c35f1d45e656 100644
--- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
+++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
@@ -265,6 +265,16 @@ Description:
          For more details, see section `5.7 GPI Status Registers
and 5.9 Memory Error Register Definitions,
          Altra Family Soc BMC Interface Specification`.
+What:
/sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome
+KernelVersion:    6.1

Should it be 6.2, as it probably won’t make it into 6.1?


Thanks for the catch. Will fix in next version.

Should be 6.3, it's missed the 6.2 merge window cycle, sorry.

thanks,


Thanks Greg,
Will update to 6.3