Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer

From: Doug Ledford
Date: Thu Mar 26 2015 - 10:09:57 EST


On Wed, 2015-03-25 at 16:09 +0100, Michael Wang wrote:
> My sincerely apologies for the corrupted mails, and thanks for Dan's kindly
> remind :-)
>
> There are too many lengthy code to check the transport type of IB device,
> or the link layer type of it's port, this patch set try to use some helper to
> refine and save us some code.
>
> TODO:
> Currently we inferred from the transport type and link layer type to identify
> the way of management, it will be better if we can directly get the indicator
> from vendor.
>
> Sean proposed one suggestion:
> https://www.mail-archive.com/linux-rdma@xxxxxxxxxxxxxxx/msg23339.html
>
> It may need a big work to adapt current implementation to utilize
> these flags elegantly.
>
> Also the performance concern on query_port() need to be addressed, may be
> some new callback like query_mgmt() could works.
>
> Michael Wang (2):
> [PATCH 1/2] IB/Verbs: Use helpers to check transport and link layer
> [PATCH 2/2] IB/Verbs: Use helpers to check IBoE technology
>
> ---
> drivers/infiniband/core/agent.c | 2 -
> drivers/infiniband/core/cm.c | 2 -
> drivers/infiniband/core/cma.c | 33 ++++++++++++------------------
> drivers/infiniband/core/mad.c | 6 ++---
> drivers/infiniband/core/multicast.c | 11 +++-------
> drivers/infiniband/core/sa_query.c | 14 ++++++------
> drivers/infiniband/core/ucm.c | 3 --
> drivers/infiniband/core/user_mad.c | 2 -
> drivers/infiniband/core/verbs.c | 5 +---
> drivers/infiniband/hw/mlx4/ah.c | 2 -
> drivers/infiniband/hw/mlx4/cq.c | 4 ---
> drivers/infiniband/hw/mlx4/mad.c | 14 +++---------
> drivers/infiniband/hw/mlx4/main.c | 8 ++-----
> drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 -
> drivers/infiniband/hw/mlx4/qp.c | 21 ++++++-------------
> drivers/infiniband/hw/mlx4/sysfs.c | 6 +----
> drivers/infiniband/ulp/ipoib/ipoib_main.c | 6 ++---
> include/rdma/ib_verbs.h | 30 +++++++++++++++++++++++++++
> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 3 --
> 19 files changed, 87 insertions(+), 87 deletions(-)
>

I think this is a step in the right direction. However, as I brought up
in a different thread, I think it would be best if we start to clear up
some of the funny things in this space, such as calling iWARP link layer
InfiniBand.

One thing we didn't discuss before is that, even if changing these items
around in user space would break user space, changing them around in the
kernel won't break anything except maybe Lustre (which we can inform the
authors about the change so they can anticipate it and do the right
thing in their code) because we can fix up our return values we pass to
user space with no impact to them as it's not on a hot path. We can
then emulate the old way of setting all these variables to user space
while fixing them up in kernel space.

So, I would suggest that we fix things up thusly:

enum transport {
TRANSPORT_IB=1,
TRANSPORT_IWARP=2,
TRANSPORT_ROCE=4,
TRANSPORT_OPA=8,
TRANSPORT_USNIC=10,
};

#define HAS_SA(ibdev) ((ibdev)->transport & (TRANSPORT_IB|TRANSPORT_OPA))
#define HAS_JUMBO_SA(ibdev) ((ibdev)->transport & TRANSPORT_OPA))

or possibly

static bool ib_dev_has_sa(struct ibv_device *ibdev)
{
return ibdev->transport & (TRANSPORT_IB | TRANSPORT_OPA);
}

If we do this, then the only thing we have to fix up to preserve ABI
with user space is to make sure that any time we export an ibv_device
struct and any time we import the same, we convert from our new internal
representation to the old representation that user space expects. And
we also need to make a few changes in the sysfs code to display the
properties as things expect. But, that would allow us to fix up what I
see as a problem right now, which is that we hide the information we
need to know what sort of device we are working on in two different
fields: the transport and the link layer. Instead, just use one field
with enough variants that we can store all of the relevant information
we need in that one field. This has the benefit that any comparisons
that happen in hot paths will now always be a single bitwise comparison
and will no longer need to hit two separate variables for two separate
compares.



--
Doug Ledford <dledford@xxxxxxxxxx>
GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: This is a digitally signed message part