Re: [PATCH rdma-next] RDMA/rdmavt: Decouple QP and SGE lists allocations

From: Jason Gunthorpe
Date: Tue May 25 2021 - 09:14:07 EST


On Thu, May 20, 2021 at 06:02:09PM -0400, Dennis Dalessandro wrote:

> > I don't want to encourage other drivers to do the same thing.
>
> I would imagine they would get the same push back we are getting here. I
> don't think this would encourage anyone honestly.

Then we are back to making infrastructure that is only useful for one,
arguably wrong, driver.

> > The correct thing to do today in 2021 is to use the standard NUMA
> > memory policy on already node-affine threads. The memory policy goes
> > into the kernel and normal non-_node allocations will obey it. When
> > combined with an appropriate node-affine HCA this will work as you are
> > expecting right now.
>
> So we shouldn't see any issue in the normal case is what you are
> saying. I'd like to believe that, proving it is not easy though.

Well, I said you have to setup the userspace properly, I'm not sure it
just works out of the box.

> > However you can't do anything like that while the kernel has the _node
> > annotations, that overrides the NUMA memory policy and breaks the
> > policy system!
>
> Does our driver doing this break the entire system? I'm not sure how that's
> possible.

It breaks your driver part of it, and if we lift it to the core code
then it breaks all drivers, so it is a hard no-go.

> Is there an effort to get rid of these per node allocations so
> ultimately we won't have a choice at some point?

Unlikely, subtle stuff like this will just be left broken in drivers
nobody cares about..

Jason