Re: [PATCH 16/26] cxl/extent: Realize extent devices

From: Dan Williams
Date: Thu May 02 2024 - 17:13:23 EST


Ira Weiny wrote:
> Jonathan Cameron wrote:
> > On Sun, 24 Mar 2024 16:18:19 -0700
> > ira.weiny@xxxxxxxxx wrote:
> >
> > > From: Navneet Singh <navneet.singh@xxxxxxxxx>
> > >
> > > Once all extents of an interleave set are present a region must
> > > surface an extent to the region.
> > >
> > > Without interleaving; endpoint decoder and region extents have a 1:1
> > > relationship. Future support for IW > 1 will maintain a N:1
> > > relationship between the device extents and region extents.
> > >
> > > Create a region extent device for every device extent found. Release of
> > > the extent device triggers a response to the underlying hardware extent.
> > >
> > > There is no strong use case to support the addition of extents which
> > > overlap previously accepted extent ranges. Reject such new extents
> > > until such time as a good use case emerges.
> > >
> > > Expose the necessary details of region extents by creating the following
> > > sysfs entries.
> > >
> > > /sys/bus/cxl/devices/dax_regionX/extentY
> > > /sys/bus/cxl/devices/dax_regionX/extentY/offset
> > > /sys/bus/cxl/devices/dax_regionX/extentY/length
> > > /sys/bus/cxl/devices/dax_regionX/extentY/label
> >
> > Docs?
>
> That is a good idea.
>
> > The label in particular worries me a little as I'm not sure what
> > is in it.
>
> I envisioned a pass through of the tag.
>
> >
> > If it's the tag one possible format is a uuid (not a coincidence
> > that it is the same length) and interpreting that as characters isn't
> > going to get us far. I wonder if we have to treat it as a binary attr
> > given we have no idea what it is.
>
> In thinking about this more (and running some experiments): none of these are
> strictly necessary in this initial implementation. No code currently uses
> them directly.
>
> I questioned these in the past and I've done so again over the weekend.
>
> I was about to rip them out entirely when I remembered Gregory Price's
> comments on Discord. There he indicating a desire to very carefully place
> dax devices. Without at least the offset and length above (and to a
> lesser extent the label) this can't be done.

Careful placement of dax-devices physically requires an entirely new
allocation ABI. There is the mapping_store() interface that was added
for a specific kexec / VMM fast restore use case, but that never
envisioned the sparse region case. So I do think it is worthwhile to
punt on that question to a later add-on feature.

[..]
> I don't think this is exactly what the user is going to expect. This can
> be resolved by by looking at the dax device mappings though.[0] So I'm going
> to leave this for now. But I expect some additional porcelain is going to
> be required to fully meet Gregory's requirements.

Not sure what the exact requirement is, but if it's the typical, "I want
to allocate by tag", then I think there is another potential coarse
grained solution that probably covers most cases. Allow multiple
dax_regions per cxl_dcd_regions, where each dax_region manages an
exclusive set of tags.

The host negotiates a dax_region tag layout with the orchestrator and
can then trust that all of the extents that show up in a given dax_region
belong to a given tag or set of tags.

This is not something that needs to be considered in the initial
enabling, but is potentially a way to avoid bolting-on a new fine grained
allocation api after the fact.


> [0]
> /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
> /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
>
> Back to the label field: It is currently just the 'tag' of the individual
> extent (because no interleaving). My vision for the interleave case would
> be for the kernel to assemble device extents into a region extent only if
> the tags match and export that.
>
> Thinking on it more though we should leave label out for now. This is the
> second time it has been questioned.

I don't understand the issue. That is a critical piece of information
and it is at the cxl device level

/sys/bus/cxl/devices/dax_regionX/extentY/label

..now I would just call that "tag" and UUID format it (to Jonathan's
point), but I see no rationale to hide what is most likely the most
useful information about an extent.