Re: [PATCH] Add support of NVDIMM memory error notification in ACPI 6.2

From: Linda Knippers
Date: Thu Jun 08 2017 - 13:31:07 EST




On 6/7/2017 5:33 PM, Kani, Toshimitsu wrote:
On Wed, 2017-06-07 at 14:06 -0700, Dan Williams wrote:
On Wed, Jun 7, 2017 at 1:57 PM, Kani, Toshimitsu <toshi.kani@xxxxxxx>
wrote:
On Wed, 2017-06-07 at 12:09 -0700, Dan Williams wrote:
On Wed, Jun 7, 2017 at 11:49 AM, Toshi Kani <toshi.kani@xxxxxxx>
wrote:

:
+
+static void acpi_nfit_uc_error_notify(struct device *dev,
acpi_handle handle)
+{
+ struct acpi_nfit_desc *acpi_desc =
dev_get_drvdata(dev);
+
+ acpi_nfit_ars_rescan(acpi_desc);

I wonder if we should gate re-scanning with a similar:

if (acpi_desc->scrub_mode == HW_ERROR_SCRUB_ON)

...check that we do in the mce notification case? Maybe not since
we
don't get an indication of where the error is without a rescan.

I think this mce case is different since the MCE handler already
knows where the new poison location is and can update badblocks
information for it. Starting ARS is an optional precaution.

However, at a minimum I think we need support for the new Start
ARS flag ("If set to 1 the firmware shall return data from a
previous scrub, if any, without starting a new scrub") and use
that for this case.

That's an interesting idea. But I wonder how users know if it is
OK to set this flag as it relies on BIOS implementation that is not
described in ACPI...

Ugh, you're right. We might need a revision-id=2 version of Start ARS
so software knows that the BIOS is aware of the new flag.

My bad. Looking at ACPI 6.2, it actually defines what you described.
Start ARS now defines bit[1] in Flags which can be set to avoid
scanning for this notification. I will update the patch to set this
flag when HW_ERROR_SCRUB_ON is not set.

Wasn't Dan concerned about how the OS can know whether the FW supports
that bit in the Start ARS?

The Query ARS Capabilities DSM has a bit that tells the OS whether the
platform supports the notification and the point of the notification was
to tell the OS it could do a Start ARS with bit 1 set. Of course, if
you get the notification then that means the platform has the capability
to deliver it, but it might not hurt to check the flag from the Query
Capabilities bit.

If the spec isn't clear about the relationship between the bits, we could
clarify that in the spec without bumping the revision number for the call.

We could also check to see what a platform does if it gets a flag it
doesn't know about. Would that be an Invalid Input Parameter?

-- ljk

Thanks,
-Toshiïïìï&ï~ï&ïïï+-ïïÝïïwïïËïïïmïbïïZrïïïï^nïrïïïzïïïhïïïï&ïïïGïïïhï(ïéïÝj"ïïïmïïïïïzïÞïïïfïïïhïïï~ïmï