Re: Hot ADD using CXL1.1 host

From: Shesha Sreenivasamurthy
Date: Tue Jan 31 2023 - 10:22:30 EST




On Mon, Jan 30, 2023 at 2:00 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> Hi Shesha, Linux email expectations are to not top post, i.e. respond
> inline, like below:
>
> Shesha Sreenivasamurthy wrote:
>> The re-configuration does not reset the device. It does re-program the PCIe
>> DVSEC for CXL Device register (Section 8.1.3 CXL 2.0 spec Pg 258), register
>> (DVSEC vendor ID 0x1E98, DCSEC ID 0x0).
>> “So you need to dynamically recreate the region, especially if your step 10
>> above resets the device.”
>> Do you mean the DAX region ?
>
> No, I mean the CXL region.
>
>> If so, I can if the system stays up. After a few seconds the system
>> crashes. Can the crash be because of a mismatch between DVSEC
>> information with what kernel was informed by BIOS during boot (Some
>> ACPI tables ?)
>
> My concern is that the platform memory decode configuration is not
> prepared for the CXL device to claim more than what was originally
> programmed in the CXL DVSEC range registers. One of the platform
> firmware updates for CXL 2.0 was the creation of the CFMWS (CXL Fixed
> Memory Window Structure) in the ACPI CEDT (CXL Early Discovery Table).
> That structure indicates which platform address ranges decode to which
> CXL host bridges. Those windows are defined in platform specific
> registersi (not enumerated to the OS). If the window is only 8GB then
> the endpoint device can not decode more. You would need to reboot to get
> the BIOS to allocate more host address space for CXL.
>
> The expectation for newer platforms is that platform firmware define
> CFMWS such that there is spare capacity in the address map for the OS to
> dynmaically map more CXL.

There seems to be some instability in using DAX. When the system is given all the device memory using efi=nosoftreserve, the stressapptest (https://github.com/stressapptest/stressapptest) runs for an extended period of time. However, when the system is booted without efi=nosoftreserve, and assigned the special purpose memory to system-ram using daxctl, the system crashes after some time (20-30 mins). Is there any known instabilities when using DAX?