Re: [PATCH v2] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node

From: Laurent Vivier
Date: Wed Feb 20 2019 - 12:57:10 EST


On 20/02/2019 18:08, Peter Zijlstra wrote:
> On Wed, Feb 20, 2019 at 05:55:20PM +0100, Laurent Vivier wrote:
>> index 3f35ba1d8fde..372278605f0d 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -1651,6 +1651,7 @@ void sched_init_numa(void)
>> */
>> tl[i++] = (struct sched_domain_topology_level){
>> .mask = sd_numa_mask,
>> + .flags = SDTL_OVERLAP,
>
> This makes no sense what so ever. The numa identify node should not have
> overlap with other domains.
>
> Are you sure this is not because of the utterly broken powerpc nonsense
> where they move CPUs between nodes?

No, I'm not sure. This why I've Cc: powerpc folks. My conclusion is only
based on the before/after changes.

I've tested some patches from powerpc ML, but they don't fix this problem:
powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update
powerpc/pseries: Perform full re-add of CPU for topology update
post-migration

So the only reason I can see to have a corrupted sched_group list is the
sched_domain_span() fonction doesn't return a correct cpumask for the
domain once a new CPU is added.

Thanks,
Laurent