Re: [PATCH v2 1/3] mm: Fix dropped memcg from mem cgroup soft limit tree

From: Michal Hocko
Date: Mon Mar 08 2021 - 03:35:05 EST


On Fri 05-03-21 11:07:59, Tim Chen wrote:
>
>
> On 3/5/21 1:11 AM, Michal Hocko wrote:
> > On Thu 04-03-21 09:35:08, Tim Chen wrote:
> >>
> >>
> >> On 2/18/21 11:13 AM, Michal Hocko wrote:
> >>
> >>>
> >>> Fixes: 4e41695356fb ("memory controller: soft limit reclaim on contention")
> >>> Acked-by: Michal Hocko <mhocko@xxxxxxxx>
> >>>
> >>> Thanks!
> >>>> ---
> >>>> mm/memcontrol.c | 6 +++++-
> >>>> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>>> index ed5cc78a8dbf..a51bf90732cb 100644
> >>>> --- a/mm/memcontrol.c
> >>>> +++ b/mm/memcontrol.c
> >>>> @@ -3505,8 +3505,12 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
> >>>> loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS))
> >>>> break;
> >>>> } while (!nr_reclaimed);
> >>>> - if (next_mz)
> >>>> + if (next_mz) {
> >>>> + spin_lock_irq(&mctz->lock);
> >>>> + __mem_cgroup_insert_exceeded(next_mz, mctz, excess);
> >>>> + spin_unlock_irq(&mctz->lock);
> >>>> css_put(&next_mz->memcg->css);
> >>>> + }
> >>>> return nr_reclaimed;
> >>>> }
> >>>>
> >>>> --
> >>>> 2.20.1
> >>>
> >>
> >> Mel,
> >>
> >> Reviewing this patch a bit more, I realize that there is a chance that the removed
> >> next_mz could be inserted back to the tree from a memcg_check_events
> >> that happen in between. So we need to make sure that the next_mz
> >> is indeed off the tree and update the excess value before adding it
> >> back. Update the patch to the patch below.
> >
> > This scenario is certainly possible but it shouldn't really matter much
> > as __mem_cgroup_insert_exceeded bails out when the node is on the tree
> > already.
> >
>
> Makes sense. We should still update the excess value with
>
> + excess = soft_limit_excess(next_mz->memcg);
> + __mem_cgroup_insert_exceeded(next_mz, mctz, excess);
>
> before doing insertion. The excess value was recorded from previous
> mz in the loop and needs to be updated to that of next_mz.

Yes. Sorry, I have missed that part previously.
--
Michal Hocko
SUSE Labs