Re: [PATCH RFC v2 05/18] cxl/port: Add Dynamic Capacity mode support to endpoint decoders

From: Ira Weiny
Date: Mon Sep 04 2023 - 20:09:13 EST


Jonathan Cameron wrote:
> On Mon, 28 Aug 2023 22:20:56 -0700
> Ira Weiny <ira.weiny@xxxxxxxxx> wrote:
>
> > Endpoint decoders used to map Dynamic Capacity must be configured to
> > point to the correct Dynamic Capacity (DC) Region. The decoder mode
> > currently represents the partition the decoder points to such as ram or
> > pmem.
> >
> > Expand the mode to include DC Regions.
> >
> > Co-developed-by: Navneet Singh <navneet.singh@xxxxxxxxx>
> > Signed-off-by: Navneet Singh <navneet.singh@xxxxxxxxx>
> > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
>
> I'm reading this in a linear fashion for now (and ideally that should
> always make sense) so I don't currently see the reason for the loops
> in here. If they are needed for a future patch, add something to the
> description to indicate that.
>
> >
> > ---
> > Changes for v2:
> > [iweiny: split from region creation patch]
> > ---
> > Documentation/ABI/testing/sysfs-bus-cxl | 19 ++++++++++---------
> > drivers/cxl/core/hdm.c | 24 ++++++++++++++++++++++++
> > drivers/cxl/core/port.c | 16 ++++++++++++++++
> > 3 files changed, 50 insertions(+), 9 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index 6350dd82b9a9..2268ffcdb604 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -257,22 +257,23 @@ Description:
> >
> > What: /sys/bus/cxl/devices/decoderX.Y/mode
> > Date: May, 2022
> > -KernelVersion: v6.0
> > +KernelVersion: v6.0, v6.6 (dcY)
> > Contact: linux-cxl@xxxxxxxxxxxxxxx
> > Description:
> > (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
> > translates from a host physical address range, to a device local
> > address range. Device-local address ranges are further split
> > - into a 'ram' (volatile memory) range and 'pmem' (persistent
> > - memory) range. The 'mode' attribute emits one of 'ram', 'pmem',
> > - 'mixed', or 'none'. The 'mixed' indication is for error cases
> > - when a decoder straddles the volatile/persistent partition
> > - boundary, and 'none' indicates the decoder is not actively
> > - decoding, or no DPA allocation policy has been set.
> > + into a 'ram' (volatile memory) range, 'pmem' (persistent
> > + memory) range, or Dynamic Capacity (DC) range. The 'mode'
> > + attribute emits one of 'ram', 'pmem', 'dcY', 'mixed', or
> > + 'none'. The 'mixed' indication is for error cases when a
> > + decoder straddles the volatile/persistent partition boundary,
> > + and 'none' indicates the decoder is not actively decoding, or
> > + no DPA allocation policy has been set.
> >
> > 'mode' can be written, when the decoder is in the 'disabled'
> > - state, with either 'ram' or 'pmem' to set the boundaries for the
> > - next allocation.
> > + state, with 'ram', 'pmem', or 'dcY' to set the boundaries for
> > + the next allocation.
> >
> >
> > What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index a254f79dd4e8..3f4af1f5fac8 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -267,6 +267,19 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled)
> > __cxl_dpa_release(cxled);
> > }
> >
> > +static int dc_mode_to_region_index(enum cxl_decoder_mode mode)
> > +{
> > + int index = 0;
> > +
> > + for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) {
> As you are relying on them being in order and adjacent for the loop, why is
>
> if (mode < CXL_DECODER_DC0 || mode > CXL_DECODER_DC7)
> return -EINVAL;
>
> return mode - CXL_DECODER_DC0;
>
> Not sufficient?

That would work yes. There is no future need for a loop. It was just
implemented this way early on and I did not really think about it too
much.

Done.

>
> > + if (mode == i)
> > + return index;
> > + index++;
> > + }
> > +
> > + return -EINVAL;
> > +}
> > +
> > static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> > resource_size_t base, resource_size_t len,
> > resource_size_t skipped)
> > @@ -429,6 +442,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> > switch (mode) {
> > case CXL_DECODER_RAM:
> > case CXL_DECODER_PMEM:
> > + case CXL_DECODER_DC0 ... CXL_DECODER_DC7:
> > break;
> > default:
> > dev_dbg(dev, "unsupported mode: %d\n", mode);
> > @@ -456,6 +470,16 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
> > goto out;
> > }
> >
> > + for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) {
> > + int index = dc_mode_to_region_index(i);
> > +
> > + if (mode == i && !resource_size(&cxlds->dc_res[index])) {
>
> Not obvious why we have the loop in this patch - perhaps it makes sense later.

I think it was just walking through the DC regions like the previous code
was walking through the PMEM/RAM 'regions'.

> If this is to enable later changes, then good to say that in the patch description.

... nope...

> otherwise, something like.
>
> int index;
>
> rc = dc_mode_to_region_index(i);
> if (rc < 0)
> goto out;
>
> index = rc;
> if (!resource_size(&cxlds->dc_res[index]) {
> ....
>

Yea... but that won't exactly work. Something like this:

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index cf5d656c271b..f250d1566682 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -463,10 +463,12 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
goto out;
}

- for (int i = CXL_DECODER_DC0; i <= CXL_DECODER_DC7; i++) {
- int index = dc_mode_to_region_index(i);
+ if (cxl_decoder_mode_is_dc(mode)) {
+ rc = dc_mode_to_region_index(mode);
+ if (rc < 0)
+ goto out;

- if (mode == i && !resource_size(&cxlds->dc_res[index])) {
+ if (!resource_size(&cxlds->dc_res[rc])) {
dev_dbg(dev, "no available dynamic capacity\n");
rc = -ENXIO;
goto out;

But looking at the function I think there could be a clean up patch before
this. I don't see the need to check the mode twice.

... Yes I think that looks cleaner.

Thanks for the review!
Ira