Re: [PATCH 3/4] mm, page_alloc: Drain per-cpu pages from workqueue context

From: Mel Gorman
Date: Mon Jan 23 2017 - 11:51:05 EST


On Mon, Jan 23, 2017 at 05:29:20PM +0100, Petr Mladek wrote:
> On Fri 2017-01-20 15:26:06, Mel Gorman wrote:
> > On Fri, Jan 20, 2017 at 03:26:05PM +0100, Vlastimil Babka wrote:
> > > > @@ -2392,8 +2404,24 @@ void drain_all_pages(struct zone *zone)
> > > > else
> > > > cpumask_clear_cpu(cpu, &cpus_with_pcps);
> > > > }
> > > > - on_each_cpu_mask(&cpus_with_pcps, (smp_call_func_t) drain_local_pages,
> > > > - zone, 1);
> > > > +
> > > > + if (works) {
> > > > + for_each_cpu(cpu, &cpus_with_pcps) {
> > > > + struct work_struct *work = per_cpu_ptr(works, cpu);
> > > > + INIT_WORK(work, drain_local_pages_wq);
> > > > + schedule_work_on(cpu, work);
> > >
> > > This translates to queue_work_on(), which has the comment of "We queue
> > > the work to a specific CPU, the caller must ensure it can't go away.",
> > > so is this safe? lru_add_drain_all() uses get_online_cpus() around this.
> > >
> >
> > get_online_cpus() would be required.
> >
> > > schedule_work_on() also uses the generic system_wq, while lru drain has
> > > its own workqueue with WQ_MEM_RECLAIM so it seems that would be useful
> > > here as well?
> > >
> >
> > I would be reluctant to introduce a dedicated queue unless there was a
> > definite case where an OOM occurred because pages were pinned on per-cpu
> > lists and couldn't be drained because the buddy allocator was depleted.
> > As it was, I thought the fallback case was excessively paranoid.
>
> I guess that you know it but it is not clear from the above paragraph.
>
> WQ_MEM_RECLAIM makes sure that there is a rescue worker available.
> It is used when all workers are busy (blocked by an allocation
> request) and new worker (kthread) cannot be forked because
> the fork would need an allocation as well.
>
> The fallback below solves the situation when struct work cannot
> be allocated. But it does not solve the situation when there is
> no worker to actually proceed the work. I am not sure if this
> is relevant for drain_all_pages().
>

I'm aware of the situation but in itself, I still don't think it justifies
a dedicated workqueue. The main call for drain_all_pages under reclaim
pressure is dubious because it's easy to trigger. For example, two contenders
for memory that are doing a streaming read or large amounts of anonymous
faults. Reclaim can be making progress but the two are racing with each
other to keep the watermarks above min and draining frequently. The IPIs
for a fairly normal situation are bad enough and even the workqueue work
isn't particularly welcome.

It would make more sense overall to move the unreserve and drain logic
into the nearly-oom path but it would likely be overkill. I'd only want
to look into that or a dedicated workqueue if there is a case of an OOM
triggered when a large number of CPUs had per-cpu pages available.

--
Mel Gorman
SUSE Labs