Re: [PATCH v2 0/2] Simplify mtty driver and mdev core

From: Alex Williamson
Date: Wed Aug 21 2019 - 00:57:30 EST


On Wed, 21 Aug 2019 04:40:15 +0000
Parav Pandit <parav@xxxxxxxxxxxx> wrote:

> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Sent: Wednesday, August 21, 2019 9:51 AM
> > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > Cc: Jiri Pirko <jiri@xxxxxxxxxxxx>; David S . Miller <davem@xxxxxxxxxxxxx>;
> > Kirti Wankhede <kwankhede@xxxxxxxxxx>; Cornelia Huck
> > <cohuck@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > cjia <cjia@xxxxxxxxxx>; netdev@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> >
> > On Wed, 21 Aug 2019 03:42:25 +0000
> > Parav Pandit <parav@xxxxxxxxxxxx> wrote:
> >
> > > > -----Original Message-----
> > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > > Sent: Tuesday, August 20, 2019 10:49 PM
> > > > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > > > Cc: Jiri Pirko <jiri@xxxxxxxxxxxx>; David S . Miller
> > > > <davem@xxxxxxxxxxxxx>; Kirti Wankhede <kwankhede@xxxxxxxxxx>;
> > > > Cornelia Huck <cohuck@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx;
> > > > linux-kernel@xxxxxxxxxxxxxxx; cjia <cjia@xxxxxxxxxx>;
> > > > netdev@xxxxxxxxxxxxxxx
> > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >
> > > > On Tue, 20 Aug 2019 08:58:02 +0000
> > > > Parav Pandit <parav@xxxxxxxxxxxx> wrote:
> > > >
> > > > > + Dave.
> > > > >
> > > > > Hi Jiri, Dave, Alex, Kirti, Cornelia,
> > > > >
> > > > > Please provide your feedback on it, how shall we proceed?
> > > > >
> > > > > Short summary of requirements.
> > > > > For a given mdev (mediated device [1]), there is one representor
> > > > > netdevice and devlink port in switchdev mode (similar to SR-IOV
> > > > > VF), And there is one netdevice for the actual mdev when mdev is probed.
> > > > >
> > > > > (a) representor netdev and devlink port should be able derive
> > > > > phys_port_name(). So that representor netdev name can be built
> > > > > deterministically across reboots.
> > > > >
> > > > > (b) for mdev's netdevice, mdev's device should have an attribute.
> > > > > This attribute can be used by udev rules/systemd or something else
> > > > > to rename netdev name deterministically.
> > > > >
> > > > > (c) IFNAMSIZ of 16 bytes is too small to fit whole UUID.
> > > > > A simple grep IFNAMSIZ in stack hints hundreds of users of
> > > > > IFNAMSIZ in drivers, uapi, netlink, boot config area and more.
> > > > > Changing IFNAMSIZ for a mdev bus doesn't really look reasonable option
> > to me.
> > > >
> > > > How many characters do we really have to work with? Your examples
> > > > below prepend various characters, ex. option-1 results in ens2f0_m10
> > > > or enm10. Do the extra 8 or 3 characters in these count against IFNAMSIZ?
> > > >
> > > Maximum 15. Last is null termination.
> > > Some udev rules setting by user prefix the PF netdev interface. I took such
> > example below where ens2f0 netdev named is prefixed.
> > > Some prefer not to prefix.
> > >
> > > > > Hence, I would like to discuss below options.
> > > > >
> > > > > Option-1: mdev index
> > > > > Introduce an optional mdev index/handle as u32 during mdev create
> > > > > time. User passes mdev index/handle as input.
> > > > >
> > > > > phys_port_name=mIndex=m%u
> > > > > mdev_index will be available in sysfs as mdev attribute for udev
> > > > > to name the mdev's netdev.
> > > > >
> > > > > example mdev create command:
> > > > > UUID=$(uuidgen)
> > > > > echo $UUID index=10
> > > > > > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> > > >
> > > > Nit, IIRC previous discussions of additional parameters used comma
> > > > separators, ex. echo $UUID,index=10 >...
> > > >
> > > Yes, ok.
> > >
> > > > > > example netdevs:
> > > > > repnetdev=ens2f0_m10 /*ens2f0 is parent PF's netdevice */
> > > >
> > > > Is the parent really relevant in the name?
> > > No. I just picked one udev example who prefixed the parent netdev name.
> > > But there are users who do not prefix it.
> > >
> > > > Tools like mdevctl are meant to
> > > > provide persistence, creating the same mdev devices on the same
> > > > parent, but that's simply the easiest policy decision. We can also
> > > > imagine that multiple parent devices might support a specified mdev
> > > > type and policies factoring in proximity, load-balancing, power
> > > > consumption, etc might be weighed such that we really don't want to
> > > > promote userspace creating dependencies on the parent association.
> > > >
> > > > > mdev_netdev=enm10
> > > > >
> > > > > Pros:
> > > > > 1. mdevctl and any other existing tools are unaffected.
> > > > > 2. netdev stack, ovs and other switching platforms are unaffected.
> > > > > 3. achieves unique phys_port_name for representor netdev 4.
> > > > > achieves unique mdev eth netdev name for the mdev using udev/systemd
> > extension.
> > > > > 5. Aligns well with mdev and netdev subsystem and similar to
> > > > > existing sriov bdf's.
> > > >
> > > > A user provided index seems strange to me. It's not really an
> > > > index, just a user specified instance number. Presumably you have
> > > > the user providing this because if it really were an index, then the
> > > > value depends on the creation order and persistence is lost. Now
> > > > the user needs to both avoid uuid collision as well as "index"
> > > > number collision. The uuid namespace is large enough to mostly ignore
> > this, but this is not. This seems like a burden.
> > > >
> > > I liked the term 'instance number', which is lot better way to say than
> > index/handle.
> > > Yes, user needs to avoid both the collision.
> > > UUID collision should not occur in most cases, they way UUID are generated.
> > > So practically users needs to pick unique 'instance number', similar to how it
> > picks unique netdev names.
> > >
> > > Burden to user comes from the requirement to get uniqueness.
> > >
> > > > > Option-2: shorter mdev name
> > > > > Extend mdev to have shorter mdev device name in addition to UUID.
> > > > > such as 'foo', 'bar'.
> > > > > Mdev will continue to have UUID.
> > > > > phys_port_name=mdev_name
> > > > >
> > > > > Pros:
> > > > > 1. All same as option-1, except mdevctl needs upgrade for newer usage.
> > > > > It is common practice to upgrade iproute2 package along with the
> > > > > kernel. Similar practice to be done with mdevctl.
> > > > > 2. Newer users of mdevctl who wants to work with non_UUID names,
> > > > > will use newer mdevctl/tools. Cons:
> > > > > 1. Dual naming scheme of mdev might affect some of the existing tools.
> > > > > It's unclear how/if it actually affects.
> > > > > mdevctl [2] is very recently developed and can be enhanced for
> > > > > dual naming scheme.
> > > >
> > > > I think we've already nak'ed this one, the device namespace becomes
> > > > meaningless if the name becomes just a string where a uuid might be
> > > > an example string. mdevs are named by uuid.
> > > >
> > > > > Option-3: mdev uuid alias
> > > > > Instead of shorter mdev name or mdev index, have alpha-numeric
> > > > > name alias. Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
> > > > > example mdev create command:
> > > > > UUID=$(uuidgen)
> > > > > echo $UUID alias=foo
> > > > > > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> > > > > > example netdevs:
> > > > > examle netdevs:
> > > > > repnetdev = ens2f0_mfoo
> > > > > mdev_netdev=enmfoo
> > > > >
> > > > > Pros:
> > > > > 1. All same as option-1.
> > > > > 2. Doesn't affect existing mdev naming scheme.
> > > > > Cons:
> > > > > 1. Index scheme of option-1 is better which can number large
> > > > > number of mdevs with fewer characters, simplifying the management
> > tool.
> > > >
> > > > No better than option-1, simply a larger secondary namespace, but
> > > > still requires the user to come up with two independent names for the
> > device.
> > > >
> > > > > Option-4: extend IFNAMESZ to be 64 bytes Extended IFNAMESZ from 16
> > > > > to
> > > > > 64 bytes phys_port_name=mdev_UUID_string
> > mdev_netdev_name=enmUUID
> > > > >
> > > > > Pros:
> > > > > 1. Doesn't require mdev extension
> > > > > Cons:
> > > > > 1. netdev stack, driver, uapi, user space, boot config wide changes 2.
> > > > > Possible user space extensions who assumed name size being 16
> > > > > characters 3. Single device type demands namesize change for all
> > > > > netdev types
> > > >
> > > > What about an alias based on the uuid? For example, we use 160-bit
> > > > sha1s daily with git (uuids are only 128-bit), but we generally
> > > > don't reference git commits with the full 20 character string.
> > > > Generally 12 characters is recommended to avoid ambiguity. Could
> > > > mdev automatically create an
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > abbreviated sha1 alias for the device? If so, how many characters
> > > > should we
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > use and what do we do on collision? The colliding device could add
> > > > enough alias characters to disambiguate (we likely couldn't re-alias
> > > > the existing device to disambiguate, but I'm not sure it matters,
> > > > userspace has sysfs to associate aliases). Ex.
> > > >
> > > > UUID=$(uuidgen)
> > > > ALIAS=$(echo $UUID | sha1sum | colrm 13)
> > > >
> > > I explained in previous reply to Cornelia, we should set UUID and ALIAS at the
> > same time.
> > > Setting is via different sysfs attribute is lot code burden with no extra benefit.
> >
> > Just an example of the alias, not proposing how it's set. In fact, proposing that
> > the user does not set it, mdev-core provides one automatically.
> >
> > > > Since there seems to be some prefix overhead, as I ask about above
> > > > in how many characters we actually have to work with in IFNAMESZ,
> > > > maybe we start with 8 characters (matching your "index" namespace)
> > > > and expand as necessary for disambiguation. If we can eliminate
> > > > overhead in IFNAMESZ, let's start with 12. Thanks,
> > > >
> > > If user is going to choose the alias, why does it have to be limited to sha1?
> > > Or you just told it as an example?
> > >
> > > It can be an alpha-numeric string.
> >
> > No, I'm proposing a different solution where mdev-core creates an alias based
> > on an abbreviated sha1. The user does not provide the alias.
> >
> > > Instead of mdev imposing number of characters on the alias, it should be best
> > left to the user.
> > > Because in future if netdev improves on the naming scheme, mdev will be
> > limiting it, which is not right.
> > > So not restricting alias size seems right to me.
> > > User configuring mdev for networking devices in a given kernel knows what
> > user is doing.
> > > So user can choose alias name size as it finds suitable.
> >
> > That's not what I'm proposing, please read again. Thanks,
>
> I understood your point. But mdev doesn't know how user is going to use udev/systemd to name the netdev.
> So even if mdev chose to pick 12 characters, it could result in collision.
> Hence the proposal to provide the alias by the user, as user know the best policy for its use case in the environment its using.
> So 12 character sha1 method will still work by user.

Haven't you already provided examples where certain drivers or
subsystems have unique netdev prefixes? If mdev provides a unique
alias within the subsystem, couldn't we simply define a netdev prefix
for the mdev subsystem and avoid all other collisions? I'm not in favor
of the user providing both a uuid and an alias/instance. Thanks,

Alex