Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region miscalculation
From: Alison Schofield
Date: Mon Jun 02 2025 - 12:48:40 EST
On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
> >-----Original Message-----
> >From: Alison Schofield <alison.schofield@xxxxxxxxx>
> >Sent: 30 May 2025 19:28
> >To: Li Ming <ming.li@xxxxxxxxxxxx>
> >Cc: dave@xxxxxxxxxxxx; Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>;
> >dave.jiang@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
> >dan.j.williams@xxxxxxxxx; Shiju Jose <shiju.jose@xxxxxxxxxx>; linux-
> >cxl@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> >Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
> >miscalculation
> >
> >On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
> >> When trying to update the scrub_cycle value of a cxl region, which
> >> means updating the scrub_cycle value of each memdev under a cxl
> >> region. cxl driver needs to guarantee the new scrub_cycle value is
> >> greater than the min_scrub_cycle value of a memdev, otherwise the
> >> updating operation will fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1).
> >>
> >> Current implementation logic of getting the min_scrub_cycle value of a
> >> cxl region is that getting the min_scrub_cycle value of each memdevs
> >> under the cxl region, then using the minimum min_scrub_cycle value as
> >> the region's min_scrub_cycle. Checking if the new scrub_cycle value is
> >> greater than this value. If yes, updating the new scrub_cycle value to
> >> each memdevs. The issue is that the new scrub_cycle value is possibly
> >> greater than the minimum min_scrub_cycle value of all memdevs but less
> >> than the maximum min_scrub_cycle value of all memdevs if memdevs have
> >> a different min_scrub_cycle value. The updating operation will always
> >> fail on these memdevs which have a greater min_scrub_cycle than the
> >> new scrub_cycle.
> >>
> >> The correct implementation logic is to get the maximum value of these
> >> memdevs' min_scrub_cycle, check if the new scrub_cycle value is
> >> greater than the value. If yes, the new scrub_cycle value is fit for the region.
> >>
> >> The change also impacts the result of
> >> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
> >> minimum min_scrub_cycle value among all memdevs under the region
> >> before the change. The interface will return the maximum
> >> min_scrub_cycle value among all memdevs under the region with the change.
> >>
> >> Signed-off-by: Li Ming <ming.li@xxxxxxxxxxxx>
> >> ---
> >> I made this change based on my understanding on the SPEC and current
> >> CXL EDAC code, but I am not sure if it is a bug or it is designed this way.
> >
> >The attribute is defined to show (per Documentation/ABI/testing/sysfs-edac-
> >scrub)
> > "Supported minimum scrub cycle duration in seconds by the memory
> >scrubber."
> >
> >Your fix, making the min the max of the mins, looks needed.
> >
> >I took a look at the max attribute. If the min is the max on the mins, then the
> >max should be the max of the maxes. But, not true. We do this:
> >
> >instead: *max = U8_MAX * 3600; /* Max set by register size */
> >
> >The comment isn't helping me, esp since the sysfs description doesn't explain
> >that we are using a constant max.
> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature Readable Attributes
> does not define a field for "max scrub cycle supported". Thus for max scrub
> cycle, returning max value of (U8_MAX) of patrol scrub cycle field.
Understand that now, thanks. I'm still wondering if both these deserve
more explanation in Documentation/ABI/testing/sysfs-edac-scrub
explaining the calculations. Like if the device represents an aggregate
of devices, like a region, the min scrub cycle is the max of the mins,
whereas if the device is a single, it's exactly what the device
returned. And for max, explaining what you replied above.
Regardless of this noise I'm making about the Docs.. I think Ming
should go ahead and v1 the fix for the min calc.
--Alison
>
> Thanks,
> Shiju
> >
> >
> >>
> >> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
> >> ---
> >> drivers/cxl/core/edac.c | 8 ++++++--
> >> 1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index
> >> 2cbc664e5d62..ad243cfe00e7 100644
> >> --- a/drivers/cxl/core/edac.c
> >> +++ b/drivers/cxl/core/edac.c
> >> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
> >cxl_patrol_scrub_context *cxl_ps_ctx,
> >> u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
> >{
> >> struct cxl_mailbox *cxl_mbox;
> >> - u8 min_scrub_cycle = U8_MAX;
> >> struct cxl_region_params *p;
> >> struct cxl_memdev *cxlmd;
> >> struct cxl_region *cxlr;
> >> + u8 min_scrub_cycle = 0;
> >> int i, ret;
> >>
> >> if (!cxl_ps_ctx->cxlr) {
> >> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
> >cxl_patrol_scrub_context *cxl_ps_ctx,
> >> if (ret)
> >> return ret;
> >>
> >> + /*
> >> + * The min_scrub_cycle of a region is the maximum value
> >among
> >> + * the min_scrub_cycle of all the memdevs under the region.
> >> + */
> >> if (min_cycle)
> >> - min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
> >> + min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
> >> }
> >>
> >> if (min_cycle)
> >> --
> >> 2.34.1
> >>
>