Re: [RFC PATCH 1/6] docs: networking: add the document for DFL Ether Group driver

From: Andrew Lunn
Date: Mon Oct 26 2020 - 09:00:16 EST


> > > +The Intel(R) PAC N3000 is a FPGA based SmartNIC platform for multi-workload
> > > +networking application acceleration. A simple diagram below to for the board:
> > > +
> > > + +----------------------------------------+
> > > + | FPGA |
> > > ++----+ +-------+ +-----------+ +----------+ +-----------+ +----------+
> > > +|QSFP|---|retimer|---|Line Side |--|User logic|--|Host Side |---|XL710 |
> > > ++----+ +-------+ |Ether Group| | | |Ether Group| |Ethernet |
> > > + |(PHY + MAC)| |wiring & | |(MAC + PHY)| |Controller|
> > > + +-----------+ |offloading| +-----------+ +----------+
> > > + | +----------+ |
> > > + | |
> > > + +----------------------------------------+
> >
> > Is XL710 required? I assume any MAC with the correct MII interface
> > will work?
>
> The XL710 is required for this implementation, in which we have the Host
> Side Ether Group facing the host. The Host Side Ether Group actually
> contains the same IP blocks as Line Side. It contains the compacted MAC &
> PHY functionalities for 25G/40G case. The 25G MAC-PHY soft IP SPEC can
> be found at:
>
> https://www.intel.com/content/www/us/en/programmable/documentation/ewo1447742896786.html
>
> So raw serial data is output from Host Side FPGA, and XL710 is good to
> handle this.

What i have seen working with Marvell Ethernet switches, is that
Marvell normally recommends connecting them to the Ethernet interfaces
of Marvell SoCs. But the switch just needs a compatible MII interface,
and lots of boards make use of non-Marvell MAC chips. Freescale FEC is
very popular.

What i'm trying to say is that ideally we need a collection of generic
drivers for the different major components on the board, and a board
driver which glues it all together. That then allows somebody to build
other boards, or integrate the FPGA directly into an embedded system
directly connected to a SoC, etc.

> > Do you really mean PHY? I actually expect it is PCS?
>
> For this implementation, yes.

Yes, you have a PHY? Or Yes, it is PCS?

To me, the phylib maintainer, having a PHY means you have a base-T
interface, 25Gbase-T, 40Gbase-T? That would be an odd and expensive
architecture when you should be able to just connect SERDES interfaces
together.

> > > +The DFL Ether Group driver registers netdev for each line side link. Users
> > > +could use standard commands (ethtool, ip, ifconfig) for configuration and
> > > +link state/statistics reading. For host side links, they are always connected
> > > +to the host ethernet controller, so they should always have same features as
> > > +the host ethernet controller. There is no need to register netdevs for them.
> >
> > So lets say the XL710 is eth0. The line side netif is eth1. Where do i
> > put the IP address? What interface do i add to quagga OSPF?
>
> The IP address should be put in eth0. eth0 should always be used for the
> tools.

That was what i was afraid of :-)

>
> The line/host side Ether Group is not the terminal of the network data stream.
> Eth1 will not paticipate in the network data exchange to host.
>
> The main purposes for eth1 are:
> 1. For users to monitor the network statistics on Line Side, and by comparing the
> statistics between eth0 & eth1, users could get some knowledge of how the User
> logic is taking function.
>
> 2. Get the link state of the front panel. The XL710 is now connected to
> Host Side of the FPGA and the its link state would be always on. So to
> check the link state of the front panel, we need to query eth1.

This is very non-intuitive. We try to avoid this in the kernel and the
API to userspace. Ethernet switches are always modelled as
accelerators for what the Linux network stack can already do. You
configure an Ethernet switch port in just the same way configure any
other netdev. You add an IP address to the switch port, you get the
Ethernet statistics from the switch port, routing protocols use the
switch port.

You design needs to be the same. All configuration needs to happen via
eth1.

Please look at the DSA architecture. What you have here is very
similar to a two port DSA switch. In DSA terminology, we would call
eth0 the master interface. It needs to be up, but otherwise the user
does not configure it. eth1 is the slave interface. It is the user
facing interface of the switch. All configuration happens on this
interface. Linux can also send/receive packets on this netdev. The
slave TX function forwards the frame to the master interface netdev,
via a DSA tagger. Frames which eth0 receive are passed through the
tagger and then passed to the slave interface.

All the infrastructure you need is already in place. Please use
it. I'm not saying you need to write a DSA driver, but you should make
use of the same ideas and low level hooks in the network stack which
DSA uses.

> > What about the QSPF socket? Can the host get access to the I2C bus?
> > The pins for TX enable, etc. ethtool -m?
>
> No, the QSPF/I2C are also managed by the BMC firmware, and host doesn't
> have interface to talk to BMC firmware about QSPF.

So can i even tell what SFP is in the socket?

> > > +Speed/Duplex
> > > +------------
> > > +The Ether Group doesn't support auto-negotiation. The link speed is fixed to
> > > +10G, 25G or 40G full duplex according to which Ether Group IP is programmed.
> >
> > So that means, if i pop out the SFP and put in a different one which
> > supports a different speed, it is expected to be broken until the FPGA
> > is reloaded?
>
> It is expected to be broken.

And since i have no access to the SFP information, i have no idea what
is actually broken? How i should configure the various layers?

> Now the line side is expected to be configured to 4x10G, 4x25G, 2x25G, 1x25G.
> host side is expected to be 4x10G or 2x40G for XL710.
>
> So 4 channel SFP is expected to be inserted to front panel. And we should use
> 4x25G SFP, which is compatible to 4x10G connection.

So if you had exported the SFP to linux, phylink could of handled some
of this for you. Probably with some extensions to phylink, but Russell
King would of probably helped you. phylink has a good idea how to
decode the SFP EEPROM and figure out the link mode. It has interfaces
to configure PCS blocks, So it could probably deal with the line side
and host side PCS. And it would of been easy to send a udev
notification that the SFP has changed, maybe user space needs to
download a different FPGA bit file? So the user would not see a broken
interface, the hardware could be reconfigured on the fly.

This is one problem i have with this driver. It is based around this
somewhat broken reference design. phylib, along with the hacks you
have, are enough for this reference design. But really you want to
make use of phylink in order to support less limited designs which
will follow. Or you need to push a lot more into the BMC, and don't
use phylib at all.

Andrew