Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed

From: Kamezawa Hiroyuki
Date: Wed Apr 01 2015 - 22:55:12 EST


On 2015/04/02 10:36, Gu Zheng wrote:
Hi Kame, TJ,

On 04/01/2015 04:30 PM, Kamezawa Hiroyuki wrote:

On 2015/04/01 12:02, Tejun Heo wrote:
On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote:
Now, hot-added cpus will have the lowest free cpu id.

Because of this, in most of systems which has only cpu-hot-add, cpu-ids are always
contiguous even after cpu hot add.
In enterprise, this would be considered as imcompatibility.

determining cpuid <-> lapicid at boot will make cpuids sparse. That may corrupt
exisiting script or configuration/resource management software.

Ugh... so, cpu number allocation on hot-add is part of userland
interface that we're locked into?

We checked most of RHEL7 packages and didn't find a problem yet.
But, for examle, we know some performance test team's test program assumed contiguous
cpuids and it failed. It was an easy case because we can ask them to fix the application
but I guess there will be some amount of customers that cpuids are contiguous.

Tying hotplug and id allocation
order together usually isn't a good idea. What if the cpu up fails
while running the notifiers? The ID is already allocated and the next
cpu being brought up will be after a hole anyway. Is this even
actually gonna affect userland?


Maybe. It's not fail-safe but....

In general, all kernel engineers (and skilled userland engineers) knows that
cpuids cannot be always contiguous and cpuids/nodeids should be checked before
running programs. I think most of engineers should be aware of that but many
users have their own assumption :(

Basically, I don't have strong objections, you're right technically.

In summary...
- users should not assume cpuids are contiguous.
- all possible ids should be fixed at boot time.
- For uses, some clarification document should be somewhere in Documenatation.

Fine to me.


So, Gu-san
1) determine all possible ids at boot.
2) clarify cpuid/nodeid can have hole because of 1) in Documenation.
3) It would be good if other guys give us ack.

Also fine.
But before this going, could you please reconsider determining the ids when firstly
present (the implementation on this patchset)?
Though it is not the perfect one in some words, but we can ignore the doubts that
mentioned above as the cpu/node hotplug is not frequent behaviours, and there seems
not anything harmful to us if we go this way.


Is it so heavy work ? Hmm. My requests are

Implement your patches as
- Please don't change current behavior at boot.
- Remember all possible apicids and give them future cpuids if not assigned.
as step 1.

Please fix dynamic pxm<->node detection in step2.

In future, memory-less node handling in x86 should be revisited.

Thanks,
-Kame







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/