Re: [PATCH v6] workqueue: Fix edge cases for calc of pool's cpumask

From: Michael Bringmann
Date: Thu Jul 27 2017 - 15:08:05 EST




On 07/27/2017 01:31 PM, Tejun Heo wrote:
> On Thu, Jul 27, 2017 at 01:15:48PM -0500, Michael Bringmann wrote:
>>
>> On NUMA systems with dynamic processors, the content of the cpumask
>> may change over time. As new processors are added via DLPAR operations,
>> workqueues are created for them. Depending upon the order in which CPUs
>> are added/removed, we may run into problems with the content of the
>> cpumask used by the workqueues. This patch deals with situations where
>> the online cpumask for a node is a proper superset of possible cpumask
>> for the node. It also deals with edge cases where the order in which
>> CPUs are removed/added from the online cpumask may leave the set for a
>> node empty, and require execution by CPUs on another node.
>>
>> In these and other cases, the patch attempts to ensure that a valid,
>> usable cpumask is used to set up newly created pools for workqueues.
>> This patch provides a fix for NUMA systems which can add/subtract
>> processors dynamically. The patch is expected to be an intermediate
>> one while developers search for any underlying issues.
>
> Please start with describing what the underlying problem is - CPU <->
> NUMA node mapping change on powerpc. The mapping shouldn't change,
> not just for workqueue, but because we don't have any kind of
> synchronization around the mapping throughout allocation paths. And
> then, please describe how this patch can at least prevent immediate
> crashes in a lot of cases.

How about this:

The problem lies with the ordering of events with respect to the order in
which we add (or remove) CPUs to NUMA systems, and make use of that knowledge.
The CPUs present are assigned to nodes, and workqueues and their infrastructure
are created to use the CPUs in a node. Workqueues are created at boot time
and updated or created as CPUs are added or removed. However, there is little
or no synchronization or ordering of these events, and the data structures
mapping CPUs to nodes may not be updated before the workqueue infrastructure
is built for a node. Thus we have the possibility of an invalid CPU mask
attribute being attached to a newly created workqueue before the CPUs have
been properly registered and published to a node.

This patch attempts to provide a partial ordering of events within workqueue
by delaying the use of newly calculated CPU masks as the value for a workqueue
attribute until they have valid content. Instead the workqueue code must delay
creating new workqueues until this function succeeds, or it can use a previously
calculated cpumask attribute that is known to be valid.

This patch attempts to ensure that a valid, usable cpumask is used to set up
newly created pools for workqueues. This patch provides a fix for NUMA systems
which can add/subtract processors dynamically. The patch is expected to be an
intermediate one while developers find and correct any underlying issues.

>
> Thanks.
>

--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line 363-5196
External: (512) 286-5196
Cell: (512) 466-0650
mwb@xxxxxxxxxxxxxxxxxx