Re: [PATCH] driver core: ensure a device has valid node id in device_add()

From: Yunsheng Lin
Date: Wed Sep 11 2019 - 03:23:54 EST


On 2019/9/11 14:49, Michal Hocko wrote:
> On Wed 11-09-19 14:15:51, Yunsheng Lin wrote:
>> On 2019/9/11 13:33, Michal Hocko wrote:
>>> On Tue 10-09-19 14:53:39, Michal Hocko wrote:
>>>> On Tue 10-09-19 20:47:40, Yunsheng Lin wrote:
>>>>> On 2019/9/10 19:12, Greg KH wrote:
>>>>>> On Tue, Sep 10, 2019 at 01:04:51PM +0200, Michal Hocko wrote:
>>>>>>> On Tue 10-09-19 18:58:05, Yunsheng Lin wrote:
>>>>>>>> On 2019/9/10 17:31, Greg KH wrote:
>>>>>>>>> On Tue, Sep 10, 2019 at 02:43:32PM +0800, Yunsheng Lin wrote:
>>>>>>>>>> On 2019/9/9 17:53, Greg KH wrote:
>>>>>>>>>>> On Mon, Sep 09, 2019 at 02:04:23PM +0800, Yunsheng Lin wrote:
>>>>>>>>>>>> Currently a device does not belong to any of the numa nodes
>>>>>>>>>>>> (dev->numa_node is NUMA_NO_NODE) when the node id is neither
>>>>>>>>>>>> specified by fw nor by virtual device layer and the device has
>>>>>>>>>>>> no parent device.
>>>>>>>>>>>
>>>>>>>>>>> Is this really a problem?
>>>>>>>>>>
>>>>>>>>>> Not really.
>>>>>>>>>> Someone need to guess the node id when it is not specified, right?
>>>>>>>>>
>>>>>>>>> No, why? Guessing guarantees you will get it wrong on some systems.
>>>>>>>>>
>>>>>>>>> Are you seeing real problems because the id is not being set? What
>>>>>>>>> problem is this fixing that you can actually observe?
>>>>>>>>
>>>>>>>> When passing the return value of dev_to_node() to cpumask_of_node()
>>>>>>>> without checking the node id if the node id is not valid, there is
>>>>>>>> global-out-of-bounds detected by KASAN as below:
>>>>>>>
>>>>>>> OK, I seem to remember this being brought up already. And now when I
>>>>>>> think about it, we really want to make cpumask_of_node NUMA_NO_NODE
>>>>>>> aware. That means using the same trick the allocator does for this
>>>>>>> special case.
>>>>>>
>>>>>> That seems reasonable to me, and much more "obvious" as to what is going
>>>>>> on.
>>>>>>
>>>>>
>>>>> Ok, thanks for the suggestion.
>>>>>
>>>>> For arm64 and x86, there are two versions of cpumask_of_node().
>>>>>
>>>>> when CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node()
>>>>> in arch/x86/mm/numa.c is used, which does partial node id checking:
>>>>>
>>>>> const struct cpumask *cpumask_of_node(int node)
>>>>> {
>>>>> if (node >= nr_node_ids) {
>>>>> printk(KERN_WARNING
>>>>> "cpumask_of_node(%d): node > nr_node_ids(%u)\n",
>>>>> node, nr_node_ids);
>>>>> dump_stack();
>>>>> return cpu_none_mask;
>>>>> }
>>>>> if (node_to_cpumask_map[node] == NULL) {
>>>>> printk(KERN_WARNING
>>>>> "cpumask_of_node(%d): no node_to_cpumask_map!\n",
>>>>> node);
>>>>> dump_stack();
>>>>> return cpu_online_mask;
>>>>> }
>>>>> return node_to_cpumask_map[node];
>>>>> }
>>>>>
>>>>> when CONFIG_DEBUG_PER_CPU_MAPS is undefined, the cpumask_of_node()
>>>>> in arch/x86/include/asm/topology.h is used:
>>>>>
>>>>> static inline const struct cpumask *cpumask_of_node(int node)
>>>>> {
>>>>> return node_to_cpumask_map[node];
>>>>> }
>>>>
>>>> I would simply go with. There shouldn't be any need for heavy weight
>>>> checks that CONFIG_DEBUG_PER_CPU_MAPS has.
>>>>
>>>> static inline const struct cpumask *cpumask_of_node(int node)
>>>> {
>>>> /* A nice comment goes here */
>>>> if (node == NUMA_NO_NODE)
>>
>> How about "(unsigned int)node >= nr_node_ids", this is suggested
>> by Peter, it checks the case where the node id set by fw is bigger
>> or equal than nr_node_ids, and still handle the < 0 case, which
>> includes NUMA_NO_NODE.
>
> Isn't that a plain bug? Is something like that really happening?

I have not seen one happened before except the NUMA_NO_NODE case.
Even with NUMA_NO_NODE case, we did not see it until we turn on
the KASAN detection.

It seems that there is no protection that prevent setting the node
of device to an invalid node.
And the kernel does have a few different check now:
1) some does " < 0" check;
2) some does "== NUMA_NO_NODE" check;
3) some does ">= MAX_NUMNODES" check;
4) some does "< 0 || >= MAX_NUMNODES || !node_online(node)" check.

We need to be consistent about the checking, right?