RE: [EXTERNAL] Re: [PATCH 05/12] net: mana: Set the DMA device max page size

From: Ajay Sharma
Date: Wed May 18 2022 - 02:14:36 EST


Thanks Long.
Hello Jason,
I am the author of the patch.
To your comment below :
" As I've already said, you are supposed to set the value that limits to ib_sge and *NOT* the value that is related to ib_umem_find_best_pgsz. It is usually 2G because the ib_sge's typically work on a 32 bit length."

The ib_sge is limited by the __sg_alloc_table_from_pages() which uses ib_dma_max_seg_size() which is what is set by the eth driver using dma_set_max_seg_size() . Currently our hw does not support PTEs larger than 2M.

So ib_umem_find_best_pgsz() takes as an input PG_SZ_BITMAP . The bitmap has all the bits set for the page sizes supported by the HW.

#define PAGE_SZ_BM (SZ_4K | SZ_8K | SZ_16K | SZ_32K | SZ_64K | SZ_128K \
| SZ_256K | SZ_512K | SZ_1M | SZ_2M)

Are you suggesting we are too restrictive in the bitmap we are passing ? or that we should not set this bitmap let the function choose default ?

Regards,
Ajay

-----Original Message-----
From: Jason Gunthorpe <jgg@xxxxxxxx>
Sent: Tuesday, May 17, 2022 5:04 PM
To: Long Li <longli@xxxxxxxxxxxxx>
Cc: Ajay Sharma <sharmaajay@xxxxxxxxxxxxx>; KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>; Wei Liu <wei.liu@xxxxxxxxxx>; Dexuan Cui <decui@xxxxxxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni <pabeni@xxxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx
Subject: [EXTERNAL] Re: [PATCH 05/12] net: mana: Set the DMA device max page size

[You don't often get email from jgg@xxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification.]

On Tue, May 17, 2022 at 08:04:58PM +0000, Long Li wrote:
> > Subject: Re: [PATCH 05/12] net: mana: Set the DMA device max page
> > size
> >
> > On Tue, May 17, 2022 at 07:32:51PM +0000, Long Li wrote:
> > > > Subject: Re: [PATCH 05/12] net: mana: Set the DMA device max
> > > > page size
> > > >
> > > > On Tue, May 17, 2022 at 02:04:29AM -0700,
> > > > longli@xxxxxxxxxxxxxxxxx
> > wrote:
> > > > > From: Long Li <longli@xxxxxxxxxxxxx>
> > > > >
> > > > > The system chooses default 64K page size if the device does
> > > > > not specify the max page size the device can handle for DMA.
> > > > > This do not work well when device is registering large chunk
> > > > > of memory in that a large page size is more efficient.
> > > > >
> > > > > Set it to the maximum hardware supported page size.
> > > >
> > > > For RDMA devices this should be set to the largest segment size
> > > > an ib_sge can take in when posting work. It should not be the
> > > > page size of MR. 2M is a weird number for that, are you sure it is right?
> > >
> > > Yes, this is the maximum page size used in hardware page tables.
> >
> > As I said, it should be the size of the sge in the WQE, not the
> > "hardware page tables"
>
> This driver uses the following code to figure out the largest page
> size for memory registration with hardware:
>
> page_sz = ib_umem_find_best_pgsz(mr->umem, PAGE_SZ_BM, iova);
>
> In this function, mr->umem is created with ib_dma_max_seg_size() as
> its max segment size when creating its sgtable.
>
> The purpose of setting DMA page size to 2M is to make sure this
> function returns the largest possible MR size that the hardware can
> take. Otherwise, this function will return 64k: the default DMA size.

As I've already said, you are supposed to set the value that limits to ib_sge and *NOT* the value that is related to ib_umem_find_best_pgsz. It is usually 2G because the ib_sge's typically work on a 32 bit length.

Jason