Re: [PATCH 4/5] mm/hotplug: Avoid RCU stalls when removing large amounts of memory

From: Michal Hocko
Date: Mon Jun 17 2019 - 04:26:53 EST


On Mon 17-06-19 17:57:16, Alastair D'Silva wrote:
> > -----Original Message-----
> > From: Michal Hocko <mhocko@xxxxxxxxxx>
> > Sent: Monday, 17 June 2019 5:47 PM
> > To: Alastair D'Silva <alastair@xxxxxxxxxxx>
> > Cc: alastair@xxxxxxxxxxx; Arun KS <arunks@xxxxxxxxxxxxxx>; Mukesh Ojha
> > <mojha@xxxxxxxxxxxxxx>; Logan Gunthorpe <logang@xxxxxxxxxxxx>; Wei
> > Yang <richard.weiyang@xxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>;
> > Ingo Molnar <mingo@xxxxxxxxxx>; linux-mm@xxxxxxxxx; Qian Cai
> > <cai@xxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Andrew Morton
> > <akpm@xxxxxxxxxxxxxxxxxxxx>; Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx>;
> > Baoquan He <bhe@xxxxxxxxxx>; David Hildenbrand <david@xxxxxxxxxx>;
> > Josh Poimboeuf <jpoimboe@xxxxxxxxxx>; Pavel Tatashin
> > <pasha.tatashin@xxxxxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Oscar
> > Salvador <osalvador@xxxxxxxx>; Jiri Kosina <jkosina@xxxxxxx>; linux-
> > kernel@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH 4/5] mm/hotplug: Avoid RCU stalls when removing large
> > amounts of memory
> >
> > On Mon 17-06-19 14:36:30, Alastair D'Silva wrote:
> > > From: Alastair D'Silva <alastair@xxxxxxxxxxx>
> > >
> > > When removing sufficiently large amounts of memory, we trigger RCU
> > > stall detection. By periodically calling cond_resched(), we avoid
> > > bogus stall warnings.
> > >
> > > Signed-off-by: Alastair D'Silva <alastair@xxxxxxxxxxx>
> > > ---
> > > mm/memory_hotplug.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index
> > > e096c987d261..382b3a0c9333 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -578,6 +578,9 @@ void __remove_pages(struct zone *zone, unsigned
> > long phys_start_pfn,
> > > __remove_section(zone, __pfn_to_section(pfn),
> > map_offset,
> > > altmap);
> > > map_offset = 0;
> > > +
> > > + if (!(i & 0x0FFF))
> > > + cond_resched();
> >
> > We already do have cond_resched before __remove_section. Why is an
> > additional needed?
>
> I was getting stalls when removing ~1TB of memory.

Have debugged what is the source of the stall? We do cond_resched once a
memory section which should be a constant unit of work regardless of the
total amount of memory to be removed.
--
Michal Hocko
SUSE Labs