Re: [RFC 09/10] platform/x86/intel/ifs: add ABI documentation for IFS

From: Williams, Dan J
Date: Thu Mar 03 2022 - 19:57:31 EST


On Tue, 2022-03-01 at 11:54 -0800, Jithu Joseph wrote:
> Add the sysfs attributes in ABI/stable for In-Field Scan.
>
> Originally-by: Kyung Min Park <kyung.min.park@xxxxxxxxx>
> Signed-off-by: Jithu Joseph <jithu.joseph@xxxxxxxxx>
> Reviewed-by: Ashok Raj <ashok.raj@xxxxxxxxx>
> Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
> ---
>  Documentation/ABI/stable/sysfs-driver-ifs | 85 +++++++++++++++++++++++

If you end up keeping this functionality under /sys/device/system/cpu
then I think this documentation belongs in:

Documentation/ABI/testing/sysfs-devices-system-cpu

...otherwise, I think it is better off in:

Documentation/ABI/testing/sysfs-devices-platform-ifs


>  1 file changed, 85 insertions(+)
>  create mode 100644 Documentation/ABI/stable/sysfs-driver-ifs
>
> diff --git a/Documentation/ABI/stable/sysfs-driver-ifs b/Documentation/ABI/stable/sysfs-driver-ifs
> new file mode 100644
> index 000000000000..8b6b9472f57e
> --- /dev/null
> +++ b/Documentation/ABI/stable/sysfs-driver-ifs
> @@ -0,0 +1,85 @@
> +What:          /sys/devices/system/cpu/ifs/run_test
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   echo 1 to trigger ifs test for all online cores.

Somewhere in this file is would be good to reference back to the core
documentation, because if this is the first place somebody lands, this
description is not that useful.

> +
> +What:          /sys/devices/system/cpu/ifs/status
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   Global status. Shows the most serious status across
> +               all cores (fail > untested > pass)

Rather than this lossy interface you might want to emit uevents on test
completion as a way to notify the results. That can add parameters to
an environment when calling a helper to process the event. See how NVME
takes advantage of this in nvme_aen_uevent() and nvme_class_uevent().

> +
> +What:          /sys/devices/system/cpu/ifs/image_version
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   Version of loaded IFS binary image.
> +
> +What:          /sys/devices/system/cpu/ifs/reload
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   echo 1 to reload IFS image.
> +
> +What:          /sys/devices/system/cpu/ifs/cpu_pass_list
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   List of cpus which passed the IFS test.

Format of this field? Is it even necessary if the user tooling can just
capture the per-core uevents associated with a test run?

> +
> +What:          /sys/devices/system/cpu/ifs/cpu_fail_list
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   List of cpus which failed the IFS test.
> +
> +What:          /sys/devices/system/cpu/ifs/cpu_untested_list
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   List of cpus which could not be tested.
> +
> +What:          /sys/module/intel_ifs/parameters/noint
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   SAF tunable parameter that user can modify before

"SAF" is never defined.

> +               the scan run if they wish to override default value.
> +
> +               When set, system interrupts are not allowed to interrupt an IFS. The
> +               default state for this parameter is set.

User implications of this setting? Like:

"Note: this setting may causes applications to miss latency / quality
of service deadlines, use with care."



> +
> +What:          /sys/module/intel_ifs/parameters/retry
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   SAF tunable parameter that user can modify at
> +               load time if they wish to override default value.
> +
> +               Maximum retry counter when the test is not executed due to an
> +               event such as interrupt. The default value is 5, it can be set to any
> +               value from 1 to 20.

Just seems like this is something the test tool can trivially handle
itself to just retry the test if it wants upon a failure.

> +
> +What:          /sys/devices/system/cpu/cpu#/ifs/run_test
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   IFS target core testing. echo 1 to trigger scan test on cpu#.

As mentioned on the last patch, if a CPU mask was an input parameter
then this would not need to be a per-CPU file.

> +
> +What:          /sys/devices/system/cpu/cpu#/ifs/status
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   The status of IFS test on a specific cpu#. It can be one of "pass", "fail"
> +               or "untested".
> +
> +What:          /sys/devices/system/cpu/cpu#/ifs/details
> +Date:          Feb 28, 2022
> +KernelVersion: 5.18.0
> +Contact:       linux-kernel@xxxxxxxxxxxxxxx
> +Description:   The details file reports the hex value of the SCAN_STATUS MSR. Note that
> +               the error_code field may contain driver defined software code not defined
> +               in the Intel SDM.

'status' and 'details' could be uevent output variables per cpu.