Re: [PATCH] drivers/rxe: improve rxe loopback

From: Leon Romanovsky
Date: Thu Jul 27 2017 - 06:40:28 EST


On Thu, Jul 27, 2017 at 12:49:17PM +0300, Marcel Apfelbaum wrote:
> On 27/07/2017 10:36, Leon Romanovsky wrote:
> > On Wed, Jul 26, 2017 at 05:52:48PM +0300, Marcel Apfelbaum wrote:
> > > Currently a packet is marked for loopback only if the source and
> > > destination address match. This is not enough when multiple
> > > gids are present in rxe's gid table and the traffic is
> > > from one gid to another.
> > >
> > > Fix it by marking the packet for loopback if the destination
> > > address appears in rxe's gid table.
> > >
> > > Signed-off-by: Marcel Apfelbaum <marcel@xxxxxxxxxx>
> > > ---
> > > drivers/infiniband/sw/rxe/rxe_net.c | 47 +++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 45 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> > > index c3a140e..b76a9a3 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_net.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> > > @@ -351,6 +351,27 @@ static void prepare_ipv6_hdr(struct dst_entry *dst, struct sk_buff *skb,
> > > ip6h->payload_len = htons(skb->len - sizeof(*ip6h));
> > > }
> > >
> > > +static inline bool addr4_same_rxe(struct rxe_dev *rxe, struct in_addr *daddr)
> > > +{
>
> Hi Leon,
> Thanks for the review.
>
> >
> > In addition to Moni's comment, no "inline" functions in *.c files, please.
> >
>
> Sure, I simply followed the function on the same file:
> static inline int addr_same(struct rxe_dev *rxe, struct rxe_av *av)
> I even borrowed the name...
>
> > > + struct in_device *in_dev;
> > > + bool same_rxe = false;
> > > +
> > > + rcu_read_lock();
> > > + in_dev = __in_dev_get_rcu(rxe->ndev);
> > > + if (!in_dev)
> > > + goto out;
> > > +
> > > + for_ifa(in_dev)
> > > + if (!memcmp(&ifa->ifa_address, daddr, sizeof(*daddr))) {
> > > + same_rxe = true;
> > > + goto out;
> > > + }
> > > + endfor_ifa(in_dev);
> >
> > I'm afraid that it will decrease performance drastically. One of the
> > possible solutions to overcome it, is to check the address of first packet
> > only, but it will work for RC only.
> >
>
> How do you know is "the first" packet?
> And yes, for UD the performance would decrease, but only
> if the netdev has multiple IPs, right?

Yes, and first lookup for QP RC will be "first packet". QP RC are created with "static" address.

>
> I'll ask on Moni's response mail for alternatives.
>
> Thanks,
> Marcel
>
> > > +out:
> > > + rcu_read_unlock();
> > > + return same_rxe;
> > > +}
> > > +
> > > static int prepare4(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
> > > struct sk_buff *skb, struct rxe_av *av)
> > > {
> > > @@ -367,7 +388,7 @@ static int prepare4(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
> > > return -EHOSTUNREACH;
> > > }
> > >
> > > - if (!memcmp(saddr, daddr, sizeof(*daddr)))
> > > + if (addr4_same_rxe(rxe, daddr))
> > > pkt->mask |= RXE_LOOPBACK_MASK;
> > >
> > > prepare_udp_hdr(skb, htons(RXE_ROCE_V2_SPORT),
> > > @@ -384,6 +405,28 @@ static int prepare4(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
> > > return 0;
> > > }
> > >
> > > +static inline bool addr6_same_rxe(struct rxe_dev *rxe, struct in6_addr *daddr)
> > > +{
> >
> > Ditto
> >
> > > + struct inet6_dev *in6_dev;
> > > + struct inet6_ifaddr *ifp;
> > > + bool same_rxe = false;
> > > +
> > > + in6_dev = in6_dev_get(rxe->ndev);
> > > + if (!in6_dev)
> > > + return false;
> > > +
> > > + read_lock_bh(&in6_dev->lock);
> > > + list_for_each_entry(ifp, &in6_dev->addr_list, if_list)
> > > + if (!memcmp(&ifp->addr, daddr, sizeof(*daddr))) {
> > > + same_rxe = true;
> > > + goto out;
> > > + }
> > > +out:
> > > + read_unlock_bh(&in6_dev->lock);
> > > + in6_dev_put(in6_dev);
> > > + return same_rxe;
> > > +}
> > > +
> > > static int prepare6(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
> > > struct sk_buff *skb, struct rxe_av *av)
> > > {
> > > @@ -398,7 +441,7 @@ static int prepare6(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
> > > return -EHOSTUNREACH;
> > > }
> > >
> > > - if (!memcmp(saddr, daddr, sizeof(*daddr)))
> > > + if (addr6_same_rxe(rxe, daddr))
> > > pkt->mask |= RXE_LOOPBACK_MASK;
> > >
> > > prepare_udp_hdr(skb, htons(RXE_ROCE_V2_SPORT),
> > > --
> > > 2.9.4
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
>

Attachment: signature.asc
Description: PGP signature