Re: linux-next: Tree for June 5

From: Mike Travis
Date: Fri Jun 06 2008 - 10:20:41 EST


Vegard Nossum wrote:
> On Fri, Jun 6, 2008 at 3:50 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
>> On Fri, Jun 6, 2008 at 3:33 PM, Mike Travis <travis@xxxxxxx> wrote:
>>> Vegard Nossum wrote:
>>>> I reproced it with gc 4.1.2. I think the error is somewhere in kernel/sched.c.
>>>>
>>>> static int __build_sched_domains(const cpumask_t *cpu_map,
>>>> struct sched_domain_attr *attr)
>>>> {
>>>> ...
>>>> for (i = 0; i < MAX_NUMNODES; i++) {
>>>> ...
>>>> sg = kmalloc_node(sizeof(struct sched_group), GFP_KERNEL, i);
>>>> ...
>>>>
>>>> This code is calling into the allocator with a spurious value of i,
>>>> which causes SLAB to use an index (of 4 in my case) that is out of
>>>> bounds for its nodelist array (at least it hasn't been initialized).
>>>>
>>>> This bit of code (a bit further down, inside the same loop) is also dubious:
>>>>
>>>> sg = kmalloc_node(sizeof(struct sched_group),
>>>> GFP_KERNEL, i);
>>>> if (!sg) {
>>>> printk(KERN_WARNING
>>>> "Can not alloc domain group for node %d\n", j);
>>>> goto error;
>>>> }
>>>>
>>>> Where it passes i to kmalloc_node() but reports an allocation for node
>>>> j. Which one is correct?
>>>>
>> Hm, I think I'm wrong and the code is correct. However...
>>
>>>> Hope this helps, will send an update if I find out more.
>>>>
>>>>
>>>> Vegard
>>>>
>>> Thanks Vegard for tracking this down. My thoughts were along the same
>>> wavelength... ;-)
>
> ...
>
>> This is a P4 3.0GHz with 1 physical CPU (but HT, so two logical CPUs).
>> Yet node 4 is claimed to have a cpu too. That's bogus!
>>
>> (But I don't think it's an error in sched.c any more, probably the
>> code that sets up the node maps.)
>
> Aha.
>
> The error is of course that the node masks for nodes > nr_node_ids are
> not valid. While this function ignores that:
>
> cpumask_t *_node_to_cpumask_ptr(int node)
> {
> if (node_to_cpumask_map == NULL) {
> printk(KERN_WARNING
> "_node_to_cpumask_ptr(%d): no node_to_cpumask_map!\n",
> node);
> dump_stack();
> return &cpu_online_map;
> }
> return &node_to_cpumask_map[node];
> }
> EXPORT_SYMBOL(_node_to_cpumask_ptr);
>
> Notice the return statement. It needs to check if node < nr_node_ids.
>
>
> Vegard
>


Thanks, yes I had that some after thought. It should check the node
index if CONFIG_DEBUG_PER_CPU_MAPS is enabled. One gotcha is that
nr_node_ids is intialized to MAX_NUMNODES until setup_node_to_cpumask_map()
sets it to the correct value. So uses before that should be caught by
the earlier check.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/