Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Christoph Lameter
Date: Thu Feb 09 2017 - 12:26:29 EST


On Thu, 9 Feb 2017, Thomas Gleixner wrote:

> You are just not getting it, really.
>
> The problem is that this for_each_online_cpu() is racy against a concurrent
> hot unplug and therefor can queue stuff for a not longer online cpu. That's
> what the mm folks tried to avoid by preventing a CPU hotplug operation
> before entering that loop.

With a stop machine action it is NOT racy because the machine goes into a
special kernel state that guarantees that key operating system structures
are not touched. See mm/page_alloc.c's use of that characteristic to build
zonelists. Thus it cannot be executing for_each_online_cpu and related
tasks (unless one does not disable preempt .... but that is a given if a
spinlock has been taken)..

> > Lets get rid of get_online_cpus() etc.
>
> And that solves what?

It gets rid of future issues with serialization in paths were we need to
lock and still do for_each_online_cpu().

> Can you please start to understand the scope of the whole hotplug machinery
> including the requirements for get_online_cpus() before you waste
> everybodys time with your uninformed and halfbaken proposals?

Its an obvious solution to the issues that have arisen multiple times with
get_online_cpus() within the slab allocators. The hotplug machinery should
make things as easy as possible for other people and having these
get_online_cpus() everywhere does complicate things.