Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Michal Hocko
Date: Tue Feb 07 2017 - 10:35:08 EST


On Tue 07-02-17 15:19:11, Michal Hocko wrote:
> On Tue 07-02-17 13:58:46, Mel Gorman wrote:
> > On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote:
> [...]
> > > Anyway, shouldn't be it sufficient to disable preemption
> > > on drain_local_pages_wq?
> >
> > That would be sufficient for a hot-removed CPU moving the drain request
> > to another CPU and avoiding any scheduling events.
> >
> > > The CPU hotplug callback will not preempt us
> > > and so we cannot work on the same cpus, right?
> > >
> >
> > I don't see a specific guarantee that it cannot be preempted and it
> > would depend on an the exact cpu hotplug implementation which is subject
> > to quite a lot of change.
>
> But we do not care about the whole cpu hotplug code. The only part we
> really do care about is the race inside drain_pages_zone and that will
> run in an atomic context on the specific CPU.
>
> You are absolutely right that using the mutex is safe as well but the
> hotplug path is already littered with locks and adding one more to the
> picture doesn't sound great to me. So I would really like to not use a
> lock if that is possible and safe (with a big fat comment of course).

And with the full changelog. I hope I haven't missed anything this time.
---