RE: [PATCH Xilinx Alveo 1/8] Documentation: fpga: Add a document describing Alveo XRT drivers

From: Max Zhen
Date: Thu Dec 03 2020 - 20:18:43 EST


Hi Moritz,

I manually fixed some line breaks. Not sure why outlook is not doing it properly.
Let me know if it still looks bad to you.

Please see my reply below.

>
>
> Max,
>
> On Thu, Dec 03, 2020 at 03:38:26AM +0000, Max Zhen wrote:
> > [...cut...]
> >
> > > > > > +xclbin over the User partition as part of DFX. When a user
> > > > > > +requests loading of a specific xclbin the xmgmt management
> > > > > > +driver reads the parent interface UUID specified in the xclbin
> > > > > > +and matches it with child interface UUID exported by Shell to
> > > > > > +determine if xclbin is compatible with the Shell. If match fails loading of xclbin is denied.
> > > > > > +
> > > > > > +xclbin loading is requested using ICAP_DOWNLOAD_AXLF ioctl command.
> > > > > > +When loading xclbin xmgmt driver performs the following operations:
> > > > > > +
> > > > > > +1. Sanity check the xclbin contents 2. Isolate the User
> > > > > > +partition 3. Download the bitstream using the FPGA config engine (ICAP) 4.
> > > > > > +De-isolate the User partition
> > > > > Is this modelled as bridges and regions?
> > > >
> > > > Alveo drivers as written today do not use fpga bridge and region
> > > > framework. It seems that if we add support for that framework, it’s
> > > > possible to receive PR program request from kernel outside of xmgmt driver?
> > > > Currently, we can’t support this and PR program can only be initiated
> > > > using XRT’s runtime API in user space.
> > >
> > > I'm not 100% sure I understand the concern here, let me reply to what
> > > I think I understand:
> > >
> > > You're worried that if you use FPGA region as interface to accept PR
> > > requests something else could attempt to reconfigure the region from
> > > within the kernel using the FPGA Region API?
> > >
> > > Assuming I got this right, I don't think this is a big deal. When you
> > > create the regions you control who gets the references to it.
> >
> > Thanks for explaining. Yes, I think you got my point :-).
>
> We can add code to make a region 'static' or 'one-time' or 'fixed'.
> >
> > >
> > > From what I've seen so far Regions seem to be roughly equivalent to
> > > Partitions, hence my surprise to see a new structure bypassing them.
> >
> > I see where the gap is.
> >
> > Regions in Linux is very different than "partitions" we have defined in xmgmt. Regions seem to be a software data structure
> > representing an area on the FPGA that can be reprogrammed. This area is protected by the concept of "bridge" which can be disabled
> > before program and reenabled after it. And you go through region when you need to reprogram this area.
>
> Your central management driver can create / destroy regions at will. It
> can keep them in a list, array or tree.
>
> Regions can but don't have to have bridges.
>
> If you need to go through the central driver to reprogram a region,
> you can use that to figure out which region to program.

That sounds fine. I can create a region and call into it from xmgmt for
PR programing. The region will, then, call the xmgmt's fpga manager
to program it.

> >
> > The "partition" is part of the main infrastructure of xmgmt driver, which represents a group of subdev drivers for each individual IP
> > (HW subcomponents). Basically, xmgmt root driver is parent of several partitions who is, in turn, the parent of several subdev drivers.
> > The parent manages the life cycle of its children here.
>
> I don't see how this is conceptually different from what DFL does, and
> they managed to use Regions and Bridges.
>
> If things are missing in the framework, please add them instead of
> rewriting an entire parallel framework.
>
> >
> > We do have a partition to represent the group of subdevs/IPs in the reprogrammable area. And we also have partitions
> > representing other areas which cannot be reprogrammed. So, it is difficult to use "Region" to implement "partition".
>
> You implement your regions callbacks, you can return -EINVAL / -ENOTTY
> if you want to fail a reprogramming request to a static partion /
> region.
>
> > From what you have explained, it seems that even if I use region / bridge in xmgmt, we can still keep it private to xmgmt instead of
> > exposing the interface to outside world, which we can't support anyway? This means that region will be used as an internal data
> > structure for xmgmt. Since we can't simply replace partition with region, we might as well just use partition throughout the driver,
> > instead of introducing two data structures and use them both in different places.
>
> Think about your partition as an extension to a region that implements
> what you need to do for your case of enumerating and reprogramming that
> particular piece of your chip.

Yes, we can add region / bridges to represent the PR area and use it in our
code path for reprogramming the PR area. I think what we will do is to
instantiate a region instance for the PR area and associate it with the
FPGA manager in xmgmt for reprogramming it. We can also instantiate
bridges and map the "ULP gate" subdev driver to it in xmgmt. Thus, we
could incorporate region and bridge data structures in xmgmt for PR
reprogramming.

This will be a non-trivial change for us. I'd like to confirm that this is what
you are looking for before we start working on the change. Let us know :-).

>
> > However, if using region/bridge can bring in other benefits, please let us know and we could see if we can also add this to xmgmt.
>
> As maintainer I can say it brings the benefit of looking like existing
> infrastructure we have. We can add features to the framework as needed
> but blanket replacing the entire thing is always a hard sell.
> >
> > > >
> > > > Or maybe we have missed some points about the use case for this
> > > > framework?
> > > >
> >
> > [...cut...]
> >
> > > > > > +-----------------
> > > > > > +
> > > > > > +As mentioned previously xsabin stores metadata which advertise
> > > > > > +HW
> > > > > > subsystems present in a partition.
> > > > > > +The metadata is stored in device tree format with well defined
> > > > > > +schema. Subsystem instantiations are captured as children of
> > > > > > +``addressable_endpoints`` node. Subsystem nodes have standard
> > > > > > attributes like ``reg``, ``interrupts`` etc. Additionally the
> > > > > > nodes also have PCIe specific attributes:
> > > > > > +``pcie_physical_function`` and ``pcie_bar_mapping``. These
> > > > > > +identify which PCIe physical function and which BAR space in
> > > > > > +that physical function the subsystem resides. XRT management
> > > > > > +driver uses this information to bind *platform drivers* to the
> > > > > > +subsystem instantiations. The platform drivers are found in
> > > > > > +**xrt-lib.ko** kernel module defined later. Below is an example
> > > > > > +of device tree for Alveo U50
> > > > > > +platform::
> > > > >
> > > > > I might be missing something, but couldn't you structure the
> > > > > addressable endpoints in a way that encode the physical function
> > > > > as a parent / child relation?
> > > >
> > > > Alveo driver does not generate the metadata. The metadata is
> > > > formatted
> > > > and generated by HW tools when the Alveo HW platform is built.
> > >
> > > Sure, but you control the tools that generate the metadata :) Your
> > > userland can structure / process it however it wants / needs?
> >
> > XRT is a runtime software stack, it is not responsible for generating HW metadata. It is one of the consumers of these data. The shell
> > design is generated by a sophisticated tool framework which is difficult to change.
>
> The Kernel userspace ABI is not going to change once it is merged, which
> is why we need to get it right. You can change your userspace code long
> time after it is merged into the kernel. The otherway round does not
> work.
>
> If you're going to do device-tree you'll need device-tree maintainers to
> be ok with your bindings.
>


Yes, we'll wait for the device-tree maintainers to chime in here :-).

Thanks,
Max

> > However, we will take this as a feedback for future revision of the tool.
> >
> > Thanks,
> > Max
>
> Btw: Can you fix your line-breaks :)
>
> - Moritz