Re: [PATCH RFC] rcu/tree: Refactor object allocation and try harder for array allocation

From: Johannes Weiner
Date: Wed Apr 22 2020 - 10:57:57 EST


On Thu, Apr 16, 2020 at 11:01:00AM -0700, Paul E. McKenney wrote:
> On Thu, Apr 16, 2020 at 09:17:45AM -0400, Joel Fernandes wrote:
> > On Thu, Apr 16, 2020 at 12:30:07PM +0200, Uladzislau Rezki wrote:
> > > I have a question about dynamic attaching of the rcu_head. Do you think
> > > that we should drop it? We have it because of it requires 8 + syzeof(struct rcu_head)
> > > bytes and is used when we can not allocate 1 page what is much more for array purpose.
> > > Therefore, dynamic attaching can succeed because of using SLAB and requesting much
> > > less memory then one page. There will be higher chance of bypassing synchronize_rcu()
> > > and inlining freeing on a stack.
> > >
> > > I agree that we should not use GFP_* flags instead we could go with GFP_NOWAIT |
> > > __GFP_NOWARN when head attaching only. Also dropping GFP_ATOMIC to keep
> > > atomic reserved memory for others.
>
> I must defer to people who understand the GFP flags better than I do.
> The suggestion of __GFP_RETRY_MAYFAIL for no memory pressure (or maybe
> when the CPU's reserve is not yet full) and __GFP_NORETRY otherwise came
> from one of these people. ;-)

The exact flags we want here depends somewhat on the rate and size of
kfree_rcu() bursts we can expect. We may want to start with one set
and instrument allocation success rates.

Memory tends to be fully consumed by the filesystem cache, so some
form of light reclaim is necessary for almost all allocations.

GFP_NOWAIT won't do any reclaim by itself, but it'll wake kswapd.
Kswapd maintains a small pool of free pages so that even allocations
that are allowed to enter reclaim usually don't have to. It would be
safe for RCU to dip into that.

However, there are some cons to using it:

- Depending on kfree_rcu() burst size, this pool could exhaust (it's
usually about half a percent of memory, but is affected by sysctls),
and then it would fail NOWAIT allocations until kswapd has caught up.

- This pool is shared by all GFP_NOWAIT users, and many (most? all?)
of them cannot actually sleep. Often they would have to drop locks,
restart list iterations, or suffer some other form of deterioration to
work around failing allocations.

Since rcu wouldn't have anything better to do than sleep at this
juncture, it may as well join the reclaim effort.

Using __GFP_NORETRY or __GFP_RETRY_MAYFAIL would allow them that
without exerting too much pressure on the VM.