RE: [net-next 1/3] net: dsa: optimize tx timestamp request handling

From: Y.b. Lu
Date: Tue Apr 20 2021 - 03:48:27 EST


> -----Original Message-----
> From: Vladimir Oltean <olteanv@xxxxxxxxx>
> Sent: 2021年4月18日 17:19
> To: Y.b. Lu <yangbo.lu@xxxxxxx>
> Cc: netdev@xxxxxxxxxxxxxxx; Richard Cochran <richardcochran@xxxxxxxxx>;
> Vladimir Oltean <vladimir.oltean@xxxxxxx>; David S . Miller
> <davem@xxxxxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Jonathan Corbet
> <corbet@xxxxxxx>; Kurt Kanzenbach <kurt@xxxxxxxxxxxxx>; Andrew Lunn
> <andrew@xxxxxxx>; Vivien Didelot <vivien.didelot@xxxxxxxxx>; Florian
> Fainelli <f.fainelli@xxxxxxxxx>; Claudiu Manoil <claudiu.manoil@xxxxxxx>;
> Alexandre Belloni <alexandre.belloni@xxxxxxxxxxx>;
> UNGLinuxDriver@xxxxxxxxxxxxx; linux-doc@xxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [net-next 1/3] net: dsa: optimize tx timestamp request handling
>
> On Fri, Apr 16, 2021 at 08:36:53PM +0800, Yangbo Lu wrote:
> > Optimization could be done on dsa_skb_tx_timestamp(), and dsa device
> > drivers should adapt to it.
> >
> > - Check SKBTX_HW_TSTAMP request flag at the very beginning, instead of in
> > port_txtstamp, so that most skbs not requiring tx timestamp just return.
>
> Agree that this is a trivial performance optimization with no downside that we
> should be making.
>
> > - No longer to identify PTP packets, and limit tx timestamping only for PTP
> > packets. If device driver likes, let device driver do.
>
> Agree that DSA has a way too heavy hand in imposing upon the driver which
> packets should be timestampable and which ones shouldn't.
>
> For example, I have a latency measurement tool called isochron which is based
> on hardware timestamping of non-PTP packets (in order to not disturb the PTP
> state machines):
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.
> com%2Fvladimiroltean%2Ftsn-scripts&amp;data=04%7C01%7Cyangbo.lu%40
> nxp.com%7C3772f63deb6f4491933208d9024af750%7C686ea1d3bc2b4c6fa9
> 2cd99c5c301635%7C0%7C0%7C637543343267914018%7CUnknown%7CTW
> FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV
> CI6Mn0%3D%7C1000&amp;sdata=deOn03j6biPEUwppNkv%2F9if1u5HIdLEtzz
> AkWW0p1rc%3D&amp;reserved=0
>
> I can't use it on DSA interfaces, for rather artificial reasons.
>
> > - It is a waste to clone skb directly in dsa_skb_tx_timestamp().
> > For one-step timestamping, a clone is not needed. For any failure of
> > port_txtstamp (this may usually happen), the skb clone has to be freed.
> > So put skb cloning into port_txtstamp where it really needs.
>
> Mostly agree. For two-step timestamping, it is an operation which all drivers
> need to do, so it is in the common potion. If we want to support one-step, we
> need to avoid cloning the PTP packets.
>

Thanks a lot for your ACK.

> > Signed-off-by: Yangbo Lu <yangbo.lu@xxxxxxx>
> > ---
> > Documentation/networking/timestamping.rst | 7 +++++--
> > .../net/dsa/hirschmann/hellcreek_hwtstamp.c | 20 ++++++++++++------
> > .../net/dsa/hirschmann/hellcreek_hwtstamp.h | 2 +-
> > drivers/net/dsa/mv88e6xxx/hwtstamp.c | 21
> +++++++++++++------
> > drivers/net/dsa/mv88e6xxx/hwtstamp.h | 6 +++---
> > drivers/net/dsa/ocelot/felix.c | 11 ++++++----
> > drivers/net/dsa/sja1105/sja1105_ptp.c | 6 +++++-
> > drivers/net/dsa/sja1105/sja1105_ptp.h | 2 +-
> > include/net/dsa.h | 2 +-
> > net/dsa/slave.c | 20 +++++-------------
> > 10 files changed, 57 insertions(+), 40 deletions(-)
> >
> > diff --git a/Documentation/networking/timestamping.rst
> > b/Documentation/networking/timestamping.rst
> > index f682e88fa87e..7f04a699a5d1 100644
> > --- a/Documentation/networking/timestamping.rst
> > +++ b/Documentation/networking/timestamping.rst
> > @@ -635,8 +635,8 @@ in generic code: a BPF classifier
> > (``ptp_classify_raw``) is used to identify PTP event messages (any
> > other packets, including PTP general messages, are not timestamped), and
> provides two hooks to drivers:
> >
> > -- ``.port_txtstamp()``: The driver is passed a clone of the
> > timestampable skb
> > - to be transmitted, before actually transmitting it. Typically, a
> > switch will
> > +- ``.port_txtstamp()``: A clone of the timestampable skb to be
> > +transmitted
> > + is needed, before actually transmitting it. Typically, a switch
> > +will
> > have a PTP TX timestamp register (or sometimes a FIFO) where the
> timestamp
> > becomes available. There may be an IRQ that is raised upon this
> timestamp's
> > availability, or the driver might have to poll after invoking @@
> > -645,6 +645,9 @@ timestamped), and provides two hooks to drivers:
> > later use (when the timestamp becomes available). Each skb is annotated
> with
> > a pointer to its clone, in ``DSA_SKB_CB(skb)->clone``, to ease the driver's
> > job of keeping track of which clone belongs to which skb.
> > + But one-step timestamping request is handled differently with above
> > + two-step timestamping. The skb clone is no longer needed since
> > + hardware will insert TX time information on packet during egress.
>
> Bonus points for updating the documentation, but I don't quite like the end
> result. Please feel free to restructure more, in order to have a clearer and more
> coherent explanation.
>
> Also, this paragraph from right above is no longer true:
>
> In code, DSA provides for most of the infrastructure for timestamping
> already,
> in generic code: a BPF classifier (``ptp_classify_raw``) is used to identify
> PTP event messages (any other packets, including PTP general messages,
> are not
> timestamped), and provides two hooks to drivers:
>
> It's nothing like that anymore. It's more of a passthrough now with your
> changes, the BPF classifier is not run by the DSA core but optionally by
> individual taggers.
>
> Here is my attempt of rewriting this documentation paragraph, feel free to
> take which parts you consider relevant:
>
> -----------------------------[cut here]-----------------------------
>
> In the generic layer, DSA provides the following infrastructure for PTP
> timestamping:
>
> - ``.port_txtstamp()``: a hook called prior to the transmission of
> packets with a hardware TX timestamping request from user space.
> This is required for two-step timestamping, since the hardware
> timestamp becomes available after the actual MAC transmission, so the
> driver must be prepared to correlate the timestamp with the original
> packet so that it can re-enqueue the packet back into the socket's
> error queue. To save the packet for when the timestamp becomes
> available, the driver can call ``skb_clone_sk`` and save the resulting
> clone in ``DSA_SKB_CB(skb)->clone``. Typically, a switch will have a
> PTP TX timestamp register (or sometimes a FIFO) where the timestamp
> becomes available. In case of a FIFO, the hardware might store
> key-value pairs of PTP sequence ID/message type/domain number and the
> actual timestamp. To perform the correlation correctly between the
> packets in a queue waiting for timestamping and the actual timestamps,
> drivers can use a BPF classifier (``ptp_classify_raw``) to identify
> the PTP transport type, and ``ptp_parse_header`` to interpret the PTP
> header fields. There may be an IRQ that is raised upon this
> timestamp's availability, or the driver might have to poll after
> invoking ``dev_queue_xmit()`` towards the host interface.
> One-step TX timestamping do not require packet cloning, since there is
> no follow-up message required by the PTP protocol (because the
> TX timestamp is embedded into the packet by the MAC), and therefore
> user space does not expect the packet annotated with the TX timestamp
> to be re-enqueued into its socket's error queue.
>
> - ``.port_rxtstamp()``: On RX, the BPF classifier is run by DSA to
> identify PTP event messages (any other packets, including PTP general
> messages, are not timestamped). The original (and only) timestampable
> skb is provided to the driver, for it to annotate it with a timestamp,
> if that is immediately available, or defer to later. On reception,
> timestamps might either be available in-band (through metadata in the
> DSA header, or attached in other ways to the packet), or out-of-band
> (through another RX timestamping FIFO). Deferral on RX is typically
> necessary when retrieving the timestamp needs a sleepable context. In
> that case, it is the responsibility of the DSA driver to call
> ``netif_rx_ni()`` on the freshly timestamped skb.
>
> -----------------------------[cut here]-----------------------------
>

That's really considerate and accuracy description. I will update with this. (Will adjust if we can drop dsa_skb_cb in dsa core.)
Thanks.

> >
> > - ``.port_rxtstamp()``: The original (and only) timestampable skb is provided
> > to the driver, for it to annotate it with a timestamp, if that is
> > immediately diff --git
> > a/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.c
> > b/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.c
> > index 69dd9a2e8bb6..2ff4b7c08b72 100644
> > --- a/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.c
> > +++ b/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.c
> > @@ -374,31 +374,39 @@ long hellcreek_hwtstamp_work(struct
> > ptp_clock_info *ptp) }
> >
> > bool hellcreek_port_txtstamp(struct dsa_switch *ds, int port,
> > - struct sk_buff *clone, unsigned int type)
> > + struct sk_buff *skb, struct sk_buff **clone)
> > {
> > struct hellcreek *hellcreek = ds->priv;
> > struct hellcreek_port_hwtstamp *ps;
> > struct ptp_header *hdr;
> > + unsigned int type;
> >
> > ps = &hellcreek->ports[port].port_hwtstamp;
> >
> > - /* Check if the driver is expected to do HW timestamping */
> > - if (!(skb_shinfo(clone)->tx_flags & SKBTX_HW_TSTAMP))
> > + type = ptp_classify_raw(skb);
> > + if (type == PTP_CLASS_NONE)
> > return false;
> >
> > /* Make sure the message is a PTP message that needs to be
> timestamped
> > * and the interaction with the HW timestamping is enabled. If not, stop
> > * here
> > */
> > - hdr = hellcreek_should_tstamp(hellcreek, port, clone, type);
> > + hdr = hellcreek_should_tstamp(hellcreek, port, skb, type);
> > if (!hdr)
> > return false;
> >
> > + *clone = skb_clone_sk(skb);
> > + if (!(*clone))
> > + return false;
> > +
> > if (test_and_set_bit_lock(HELLCREEK_HWTSTAMP_TX_IN_PROGRESS,
> > - &ps->state))
> > + &ps->state)) {
> > + kfree_skb(*clone);
> > + *clone = NULL;
> > return false;
> > + }
> >
> > - ps->tx_skb = clone;
> > + ps->tx_skb = *clone;
> >
> > /* store the number of ticks occurred since system start-up till this
> > * moment
> > diff --git a/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.h
> > b/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.h
> > index c0745ffa1ebb..58cc96642076 100644
> > --- a/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.h
> > +++ b/drivers/net/dsa/hirschmann/hellcreek_hwtstamp.h
> > @@ -45,7 +45,7 @@ int hellcreek_port_hwtstamp_get(struct dsa_switch
> > *ds, int port, bool hellcreek_port_rxtstamp(struct dsa_switch *ds, int port,
> > struct sk_buff *clone, unsigned int type); bool
> > hellcreek_port_txtstamp(struct dsa_switch *ds, int port,
> > - struct sk_buff *clone, unsigned int type);
> > + struct sk_buff *skb, struct sk_buff **clone);
> >
> > int hellcreek_get_ts_info(struct dsa_switch *ds, int port,
> > struct ethtool_ts_info *info);
> > diff --git a/drivers/net/dsa/mv88e6xxx/hwtstamp.c
> > b/drivers/net/dsa/mv88e6xxx/hwtstamp.c
> > index 094d17a1d037..280a95962861 100644
> > --- a/drivers/net/dsa/mv88e6xxx/hwtstamp.c
> > +++ b/drivers/net/dsa/mv88e6xxx/hwtstamp.c
> > @@ -469,24 +469,33 @@ long mv88e6xxx_hwtstamp_work(struct
> > ptp_clock_info *ptp) }
> >
> > bool mv88e6xxx_port_txtstamp(struct dsa_switch *ds, int port,
> > - struct sk_buff *clone, unsigned int type)
> > + struct sk_buff *skb, struct sk_buff **clone)
> > {
> > struct mv88e6xxx_chip *chip = ds->priv;
> > struct mv88e6xxx_port_hwtstamp *ps = &chip->port_hwtstamp[port];
> > struct ptp_header *hdr;
> > + unsigned int type;
> >
> > - if (!(skb_shinfo(clone)->tx_flags & SKBTX_HW_TSTAMP))
> > - return false;
> > + type = ptp_classify_raw(skb);
> > + if (type == PTP_CLASS_NONE)
> > + return false
> >
> > - hdr = mv88e6xxx_should_tstamp(chip, port, clone, type);
> > + hdr = mv88e6xxx_should_tstamp(chip, port, skb, type);
> > if (!hdr)
> > return false;
> >
> > + *clone = skb_clone_sk(skb);
> > + if (!(*clone))
> > + return false;
> > +
> > if (test_and_set_bit_lock(MV88E6XXX_HWTSTAMP_TX_IN_PROGRESS,
> > - &ps->state))
> > + &ps->state)) {
> > + kfree_skb(*clone);
> > + *clone = NULL;
> > return false;
> > + }
> >
> > - ps->tx_skb = clone;
> > + ps->tx_skb = *clone;
> > ps->tx_tstamp_start = jiffies;
> > ps->tx_seq_id = be16_to_cpu(hdr->sequence_id);
> >
> > diff --git a/drivers/net/dsa/mv88e6xxx/hwtstamp.h
> > b/drivers/net/dsa/mv88e6xxx/hwtstamp.h
> > index 9da9f197ba02..da2b253334d0 100644
> > --- a/drivers/net/dsa/mv88e6xxx/hwtstamp.h
> > +++ b/drivers/net/dsa/mv88e6xxx/hwtstamp.h
> > @@ -118,7 +118,7 @@ int mv88e6xxx_port_hwtstamp_get(struct
> dsa_switch
> > *ds, int port, bool mv88e6xxx_port_rxtstamp(struct dsa_switch *ds, int
> port,
> > struct sk_buff *clone, unsigned int type); bool
> > mv88e6xxx_port_txtstamp(struct dsa_switch *ds, int port,
> > - struct sk_buff *clone, unsigned int type);
> > + struct sk_buff *skb, struct sk_buff **clone);
> >
> > int mv88e6xxx_get_ts_info(struct dsa_switch *ds, int port,
> > struct ethtool_ts_info *info);
> > @@ -152,8 +152,8 @@ static inline bool mv88e6xxx_port_rxtstamp(struct
> > dsa_switch *ds, int port, }
> >
> > static inline bool mv88e6xxx_port_txtstamp(struct dsa_switch *ds, int port,
> > - struct sk_buff *clone,
> > - unsigned int type)
> > + struct sk_buff *skb,
> > + struct sk_buff **clone)
> > {
> > return false;
> > }
> > diff --git a/drivers/net/dsa/ocelot/felix.c
> > b/drivers/net/dsa/ocelot/felix.c index 6b5442be0230..cdec2f5e271c
> > 100644
> > --- a/drivers/net/dsa/ocelot/felix.c
> > +++ b/drivers/net/dsa/ocelot/felix.c
> > @@ -1396,14 +1396,17 @@ static bool felix_rxtstamp(struct dsa_switch
> > *ds, int port, }
> >
> > static bool felix_txtstamp(struct dsa_switch *ds, int port,
> > - struct sk_buff *clone, unsigned int type)
> > + struct sk_buff *skb, struct sk_buff **clone)
> > {
> > struct ocelot *ocelot = ds->priv;
> > struct ocelot_port *ocelot_port = ocelot->ports[port];
> >
> > - if (ocelot->ptp && (skb_shinfo(clone)->tx_flags & SKBTX_HW_TSTAMP)
> &&
> > - ocelot_port->ptp_cmd == IFH_REW_OP_TWO_STEP_PTP) {
> > - ocelot_port_add_txtstamp_skb(ocelot, port, clone);
> > + if (ocelot->ptp && ocelot_port->ptp_cmd ==
> IFH_REW_OP_TWO_STEP_PTP) {
> > + *clone = skb_clone_sk(skb);
> > + if (!(*clone))
> > + return false;
> > +
> > + ocelot_port_add_txtstamp_skb(ocelot, port, *clone);
> > return true;
> > }
> >
> > diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c
> > b/drivers/net/dsa/sja1105/sja1105_ptp.c
> > index 1b90570b257b..6a1f854a8c33 100644
> > --- a/drivers/net/dsa/sja1105/sja1105_ptp.c
> > +++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
> > @@ -436,7 +436,7 @@ bool sja1105_port_rxtstamp(struct dsa_switch *ds,
> int port,
> > * callback, where we will timestamp it synchronously.
> > */
> > bool sja1105_port_txtstamp(struct dsa_switch *ds, int port,
> > - struct sk_buff *skb, unsigned int type)
> > + struct sk_buff *skb, struct sk_buff **clone)
> > {
> > struct sja1105_private *priv = ds->priv;
> > struct sja1105_port *sp = &priv->ports[port]; @@ -444,6 +444,10 @@
> > bool sja1105_port_txtstamp(struct dsa_switch *ds, int port,
> > if (!sp->hwts_tx_en)
> > return false;
> >
> > + *clone = skb_clone_sk(skb);
> > + if (!(*clone))
> > + return false;
> > +
> > return true;
> > }
> >
> > diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.h
> > b/drivers/net/dsa/sja1105/sja1105_ptp.h
> > index 3daa33e98e77..ab80b73219cb 100644
> > --- a/drivers/net/dsa/sja1105/sja1105_ptp.h
> > +++ b/drivers/net/dsa/sja1105/sja1105_ptp.h
> > @@ -105,7 +105,7 @@ bool sja1105_port_rxtstamp(struct dsa_switch *ds,
> int port,
> > struct sk_buff *skb, unsigned int type);
> >
> > bool sja1105_port_txtstamp(struct dsa_switch *ds, int port,
> > - struct sk_buff *skb, unsigned int type);
> > + struct sk_buff *skb, struct sk_buff **clone);
> >
> > int sja1105_hwtstamp_get(struct dsa_switch *ds, int port, struct
> > ifreq *ifr);
> >
> > diff --git a/include/net/dsa.h b/include/net/dsa.h index
> > 1259b0f40684..c8415c324e27 100644
> > --- a/include/net/dsa.h
> > +++ b/include/net/dsa.h
> > @@ -734,7 +734,7 @@ struct dsa_switch_ops {
> > int (*port_hwtstamp_set)(struct dsa_switch *ds, int port,
> > struct ifreq *ifr);
> > bool (*port_txtstamp)(struct dsa_switch *ds, int port,
> > - struct sk_buff *clone, unsigned int type);
> > + struct sk_buff *skb, struct sk_buff **clone);
>
> How about not passing "clone" back to DSA as an argument by reference, but
> instead require the driver to populate DSA_SKB_CB(skb)->clone if it needs to do
> so?
>
> Also, how about changing the return type to void? Returning true or false
> makes no difference.
>
> > bool (*port_rxtstamp)(struct dsa_switch *ds, int port,
> > struct sk_buff *skb, unsigned int type);
> >
> > diff --git a/net/dsa/slave.c b/net/dsa/slave.c index
> > 9300cb66e500..5b746a903ef4 100644
> > --- a/net/dsa/slave.c
> > +++ b/net/dsa/slave.c
> > @@ -19,7 +19,6 @@
> > #include <linux/if_bridge.h>
> > #include <linux/if_hsr.h>
> > #include <linux/netpoll.h>
> > -#include <linux/ptp_classify.h>
> >
> > #include "dsa_priv.h"
> >
> > @@ -555,26 +554,19 @@ static void dsa_skb_tx_timestamp(struct
> dsa_slave_priv *p,
> > struct sk_buff *skb)
> > {
> > struct dsa_switch *ds = p->dp->ds;
> > - struct sk_buff *clone;
> > - unsigned int type;
> > + struct sk_buff *clone = NULL;
> >
> > - type = ptp_classify_raw(skb);
> > - if (type == PTP_CLASS_NONE)
> > + if (!(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
> > return;
> >
> > if (!ds->ops->port_txtstamp)
> > return;
> >
> > - clone = skb_clone_sk(skb);
> > - if (!clone)
> > + if (!ds->ops->port_txtstamp(ds, p->dp->index, skb, &clone))
> > return;
> >
> > - if (ds->ops->port_txtstamp(ds, p->dp->index, clone, type)) {
> > + if (clone)
> > DSA_SKB_CB(skb)->clone = clone;
> > - return;
> > - }
> > -
> > - kfree_skb(clone);
> > }
> >
> > netdev_tx_t dsa_enqueue_skb(struct sk_buff *skb, struct net_device
> > *dev) @@ -628,9 +620,7 @@ static netdev_tx_t dsa_slave_xmit(struct
> > sk_buff *skb, struct net_device *dev)
> >
> > DSA_SKB_CB(skb)->clone = NULL;
> >
> > - /* Identify PTP protocol packets, clone them, and pass them to the
> > - * switch driver
> > - */
> > + /* Handle tx timestamp request if has */
>
> s/if has/if any/

Sorry. Will fix.

>
> > dsa_skb_tx_timestamp(p, skb);
> >
> > if (dsa_realloc_skb(skb, dev)) {
> > --
> > 2.25.1
> >