Re: divide error in select_task_rq_fair()

From: Myron Stowe
Date: Thu Nov 11 2010 - 13:28:13 EST


On Fri, 2010-11-05 at 07:17 +0100, Eric Dumazet wrote:
> Le jeudi 04 novembre 2010 Ã 20:00 -0600, Bjorn Helgaas a Ãcrit :
>
> > Is that going to help you debug the problem? The solution is not going
> > to be something like "set NR_CPUS=x". If NR_CPUS is too small, the
> > machine should still *boot*, even if we can't use all the CPUs in the
> > box.
> >
>
> Yes, it will help to understand the layout of cpu / domains and make
> appropriate changes.
>
> Alternative is you send me such a machine :=)

I opened a BZ on this issue as it seems to be a regression -
https://bugzilla.kernel.org/show_bug.cgi?id=22662

I also, as indicated in the BZ, bisected the kernel which gave the
following results and reverting 50f2d7f682f9c0ed58191d0982fe77888d59d162
did re-enable booting on the box in question (an HP dl980g7). Let me
know what further info you need or patches to test for debugging this.

Thanks,

commit 50f2d7f682f9c0ed58191d0982fe77888d59d162
Author: Nikanth Karthikesan <knikanth@xxxxxxx>
Date: Thu Sep 30 17:34:10 2010 +0530

x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA

commit d9c2d5ac6af87b4491bff107113aaf16f6c2b2d9 "x86, numa: Use near(er)
online node instead of roundrobin for NUMA" changed NUMA initialization on
Intel to choose the nearest online node or first node. Fake NUMA would be
better of with round-robin initialization, instead of the all CPUS on
first node. Change the choice of first node, back to round-robin.

For testing NUMA kernel behaviour without cpusets and NUMA aware
applications, it would be better to have cpus in different nodes, rather
than all in a single node. With cpusets migration of tasks scenarios
cannot not be tested.

I guess having it round-robin shouldn't affect the use cases for all cpus
on the first node.

The code comments in arch/x86/mm/numa_64.c:759 indicate that this used to
be the case, which was changed by commit d9c2d5ac6. It changed from
roundrobin to nearer or first node. And I couldn't find any reason for
this change in its changelog.

Signed-off-by: Nikanth Karthikesan <knikanth@xxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>
> Thanks
>
>


--
Myron Stowe Linux Kernel Developer
Fort Collins, CO Office of Corporate Strategy and Technology

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/