Re: [PATCH] cxl/port: Disable decoder setup for endpoints in RCD mode

From: Dan Williams
Date: Tue Feb 14 2023 - 17:29:10 EST


Robert Richter wrote:
> Dan,
>
> On 09.02.23 09:07:18, Dan Williams wrote:
> > Robert Richter wrote:
> > > In RCD mode the HDM decoder capability is optional for endpoints and
> > > may not exist. The HDM range registers are used instead. Since the
> > > driver relies on the existence of an HDM decoder capability, its
> > > absence will cause the initialization of a memory card to fail.
> > >
> > > Moreover, the driver also tries to enable or reuse enabled memory
> > > ranges. In the worst case this may lead to a system hang due to
> > > disabling system memory that was previously provided and setup by
> > > system firmware.
> > >
> > > To solve the issues described, disable decoder setup for RCD endpoints
> > > and instead rely exclusively on system firmware to enable those memory
> > > ranges. Decoders are used by the kernel to setup and configure CXL
> > > memory regions, esp. to enable and disable them. Since Hot-plug is not
> > > supported for devices in RCD mode, the ability to disable that memory
> > > by the kernel using a decoder is not a necessarily requirement,
> > > decoders are not needed then.
> > >
> > > Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
> > > Signed-off-by: Robert Richter <rrichter@xxxxxxx>
> >
> > Does Dave's series address this problem?
> >
> > https://lore.kernel.org/linux-cxl/167588394236.1155956.8466475582138210344.stgit@djiang5-mobl3.local/
> >
> > ...that is arranging for the driver to carry-on in the absence of the
> > HDM Decoder Capability.
>
> it might only solve the missing hdm decoder capability. I need to take
> a closer look if that also solves a system hang I was debugging which
> is caused by clearing the memory disable bit in the hdm dvsec range
> register. So the best would be to use this patch now to fix decoder
> initialization in RCD mode and then have Dave's patches on top. I am
> going to test the series too.

My concern with this patch is that it skips HDM decoder enumeration
entirely in RCD mode. The CXL cards I have seen are CXL 1.1+ and do
export the HDM decoder capability.

The driver turns off mem_enable in a few scenarios, one of them indeed
looks buggy, but does not seem to be the one you addressed. The driver
should only disable mem if it was also the agent that enabled mem, but
looks like it does not always do that.

Can you confirm if this fixes this issue?

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index c18ed1bbb54d..2db3b5cf41e9 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -385,7 +385,8 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
* If the HDM Decoder Capability is already enabled then assume
* that some other agent like platform firmware set it up.
*/
- if (global_ctrl & CXL_HDM_DECODER_ENABLE || (!hdm && info->mem_enabled))
+ if (!info->mem_enabled &&
+ (global_ctrl & CXL_HDM_DECODER_ENABLE || !hdm))
return devm_cxl_enable_mem(&port->dev, cxlds);
else if (!hdm)
return -ENODEV;

Otherwise can you confirm if the platform provides a CFMWS window that
matches the range-register programming? If this is the problem then I
think this needs a platform quirk to workaround a BIOS that violates
kernel expectations.