Re: [PATCH v4 2/4] mm: failfast mode with __GFP_NORETRY in alloc_contig_range

From: Michal Hocko
Date: Thu Jan 28 2021 - 02:55:06 EST


On Wed 27-01-21 12:42:45, Minchan Kim wrote:
> On Tue, Jan 26, 2021 at 08:44:49AM +0100, Michal Hocko wrote:
> > On Mon 25-01-21 11:33:36, Minchan Kim wrote:
> > > On Mon, Jan 25, 2021 at 02:12:00PM +0100, Michal Hocko wrote:
> > > > On Thu 21-01-21 09:55:00, Minchan Kim wrote:
> > > > > Contiguous memory allocation can be stalled due to waiting
> > > > > on page writeback and/or page lock which causes unpredictable
> > > > > delay. It's a unavoidable cost for the requestor to get *big*
> > > > > contiguous memory but it's expensive for *small* contiguous
> > > > > memory(e.g., order-4) because caller could retry the request
> > > > > in different range where would have easy migratable pages
> > > > > without stalling.
> > > > >
> > > > > This patch introduce __GFP_NORETRY as compaction gfp_mask in
> > > > > alloc_contig_range so it will fail fast without blocking
> > > > > when it encounters pages needed waiting.
> > > >
> > > > I am not against controling how hard this allocator tries with gfp mask
> > > > but this changelog is rather void on any data and any user.
> > > >
> > > > It is also rather dubious to have retries when then caller says to not
> > > > retry.
> > >
> > > Since max_tries is 1 with ++tries, it shouldn't retry.
> >
> > OK, I have missed that. This is a tricky code. ASYNC mode should be
> > completely orthogonal to the retries count. Those are different things.
> > Page allocator does an explicit bail out based on __GFP_NORETRY. You
> > should be doing the same.
>
> Before sending next revision, let me check this part again.
>
> I want to use __GFP_NORETRY to indicate "opportunistic-easy-to-fail attempt"
> and I want to use ASYNC migrate_mode to help the goal.
>
> Do you see the problem?

No, as I've said. This is a normal NORETRY policy. And ASYNC migration
is a mere implementation detail you do not have bother your users about.
This is the semantic view. From the implementation POV it should be the
gfp mask to drive decisions rather than a random (ASYNC) flag to control
retries as you did here.

--
Michal Hocko
SUSE Labs