RE:(2) [PATCH] dma-buf: system_heap: avoid reclaim for order 4

From: 김재원
Date: Wed Jan 25 2023 - 23:42:53 EST


> On Wed, Jan 25, 2023 at 2:20 AM Jaewon Kim <jaewon31.kim@xxxxxxxxxxx> wrote:
> > > > On Tue, Jan 17, 2023 at 10:54 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Jan 17, 2023 at 12:31 AM Jaewon Kim <jaewon31.kim@xxxxxxxxxxx> wrote:
> > > > > > > Using order 4 pages would be helpful for many IOMMUs, but it could spend
> > > > > > > quite much time in page allocation perspective.
> > > > > > >
> > > > > > > The order 4 allocation with __GFP_RECLAIM may spend much time in
> > > > > > > reclaim and compation logic. __GFP_NORETRY also may affect. These cause
> > > > > > > unpredictable delay.
> > > > > > >
> > > > > > > To get reasonable allocation speed from dma-buf system heap, use
> > > > > > > HIGH_ORDER_GFP for order 4 to avoid reclaim.
> > > > >
> > > > > Thanks for sharing this!
> > > > > The case where the allocation gets stuck behind reclaim under pressure
> > > > > does sound undesirable, but I'd be a bit hesitant to tweak numbers
> > > > > that have been used for a long while (going back to ion) without a bit
> > > > > more data.
> > > > >
> > > > > It might be good to also better understand the tradeoff of potential
> > > > > on-going impact to performance from using low order pages when the
> > > > > buffer is used. Do you have any details like or tests that you could
> > > > > share to help ensure this won't impact other users?
> > > > >
> > > > > TJ: Do you have any additional thoughts on this?
> > > > >
> > > > I don't have any data on how often we hit reclaim for mid order
> > > > allocations. That would be interesting to know. However the 70th
> > > > percentile of system-wide buffer sizes while running the camera on my
> > > > phone is still only 1 page, so it looks like this change would affect
> > > > a subset of use-cases.
> > > >
> > > > Wouldn't this change make it less likely to get an order 4 allocation
> > > > (under memory pressure)? The commit message makes me think the goal of
> > > > the change is to get more of them.
> > >
> > > Hello John Stultz
> > >
> > > I've been waiting for your next reply.
>
> Sorry, I was thinking you were gathering data on the tradeoffs. Sorry
> for my confusion.
>
> > > With my commit, we may gather less number of order 4 pages and fill the
> > > requested size with more number of order 0 pages. I think, howerver, stable
> > > allocation speed is quite important so that corresponding user space
> > > context can move on within a specific time.
> > >
> > > Not only compaction but reclaim also, I think, would be invoked more if the
> > > __GFP_RECLAIM is added on order 4. I expect the reclaim could be decreased
> > > if we move to order 0.
> > >
> >
> > Additionally I'd like to say the old legacy ion system heap also used the
> > __GFP_RECLAIM only for order 8, not for order 4.
> >
> > drivers/staging/android/ion/ion_system_heap.c
> >
> > static gfp_t high_order_gfp_flags = (GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN |
> > __GFP_NORETRY) & ~__GFP_RECLAIM;
> > static gfp_t low_order_gfp_flags = GFP_HIGHUSER | __GFP_ZERO;
> > static const unsigned int orders[] = {8, 4, 0};
> >
> > static int ion_system_heap_create_pools(struct ion_page_pool **pools)
> > {
> > int i;
> >
> > for (i = 0; i < NUM_ORDERS; i++) {
> > struct ion_page_pool *pool;
> > gfp_t gfp_flags = low_order_gfp_flags;
> >
> > if (orders[i] > 4)
> > gfp_flags = high_order_gfp_flags;
>
>
> This seems a bit backwards from your statement. It's only removing
> __GFP_RECLAIM on order 8 (high_order_gfp_flags).

Oh sorry, my fault. I also read wrongly. But as far as I know, most of
AP chipset vendors have been using __GFP_RECLAIM only for order 0.
I can't say in detail though.

>
> So apologies again, but how is that different from the existing code?
>
> #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
> #define MID_ORDER_GFP (LOW_ORDER_GFP | __GFP_NOWARN)
> #define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
> | __GFP_NORETRY) & ~__GFP_RECLAIM) \
> | __GFP_COMP)
> static gfp_t order_flags[] = {HIGH_ORDER_GFP, MID_ORDER_GFP, LOW_ORDER_GFP};
>
> Where the main reason we introduced the mid-order flags is to avoid
> the warnings on order 4 allocation failures when we'll fall back to
> order 0
>
> The only substantial difference I see between the old ion code and
> what we have now is the GFP_COMP addition, which is a bit hazy in my
> memory. I unfortunately don't have a record of why it was added (don't
> have access to my old mail box), so I suspect it was something brought
> up in private review. Dropping that from the low order flags probably
> makes sense as TJ pointed out, but this isn't what your patch is
> changing.
>
> Your patch is changing that for mid-order allocations we'll use the
> high order flags, so we'll not retry and not reclaim, so there will be
> more failing and falling back to single page allocations.
> This makes sense to make allocation time faster and more deterministic
> (I like it!), but potentially has the tradeoff of losing the
> performance benefit of using mid order page sizes.
>
> I suspect your change is a net win overall, as the cumulative effect
> of using larger pages probably won't benefit more than the large
> indeterministic allocation time, particularly under pressure.
>
> But because your change is different from what the old ion code did, I
> want to be a little cautious. So it would be nice to see some
> evaluation of not just the benefits the patch provides you but also of
> what negative impact it might have. And so far you haven't provided
> any details there.
>
> A quick example might be for the use case where mid-order allocations
> are causing you trouble, you could see how the performance changes if
> you force all mid-order allocations to be single page allocations (so
> orders[] = {8, 0, 0};) and compare it with the current code when
> there's no memory pressure (right after reboot when pages haven't been
> fragmented) so the mid-order allocations will succeed. That will let
> us know the potential downside if we have brief / transient pressure
> at allocation time that forces small pages.
>
> Does that make sense?

Let me try this. It make take some days. But I guess it depends on memory
status as you said. If there were quite many order 4 pages, then 8 4 0
should be faster than 8 0 0.

I don't know this is a right approach. In my opinion, except the specific
cases like right after reboot, there are not many order 4 pages. And
in determinisitic allocation time perspective, I think avoiding too long
allocations is more important than making faster with already existing
free order 4 pages.

BR
Jaewon Kim

>
> thanks
> -john