Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

From: Eric W. Biederman
Date: Fri Jun 01 2018 - 10:34:17 EST


Michal Hocko <mhocko@xxxxxxxxxx> writes:

> On Thu 31-05-18 20:07:28, Eric W. Biederman wrote:
>> Michal Hocko <mhocko@xxxxxxxxxx> writes:
>>
>> > On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
>> >> This function searches for a new mm owner in children and siblings,
>> >> and then iterates over all processes in the system in unlikely case.
>> >> Despite the case is unlikely, its probability growths with the number
>> >> of processes in the system. The time, spent on iterations, also growths.
>> >> I regulary observe mm_update_next_owner() in crash dumps (not related
>> >> to this function) of the nodes with many processes (20K+), so it looks
>> >> like it's not so unlikely case.
>> >
>> > Did you manage to find the pattern that forces mm_update_next_owner to
>> > slow paths? This really shouldn't trigger very often. If we can fallback
>> > easily then I suspect that we should be better off reconsidering
>> > mm->owner and try to come up with something more clever. I've had a
>> > patch to remove owner few years back. It needed some work to finish but
>> > maybe that would be a better than try to make non-scalable thing suck
>> > less.
>>
>> Reading through the code I just found a trivial pattern that triggers
>> this. Create a multi-threaded process. Have the thread group leader
>> (the first thread) exit.
>
> Hmm, I thought that we try to iterate over threads in the same thread
> group first. But we are not doing that. Anyway just CLONE_VM without
> CLONE_THREAD would achieve the same pathological path but that should be
> rare.

Yes, if the child exited. The code searches the children and siblings
but the parents of the process that exited.

> Group leader exiting early without tearing down the whole thread
> group should be quite rare as well. No question that somebody might do
> that on purpose though...

The group leader exiting early is a completely legitimate and reasonable
thing to do, even if it is rare.

I think all it would take is one program like that in a work-load for
the performance to descend into something unpleasant.

Eric