[bug] Re: [PATCH] - Fix stack overflow for large values ofMAX_APICS

From: Ingo Molnar
Date: Tue Jun 24 2008 - 06:24:30 EST



* Ingo Molnar <mingo@xxxxxxx> wrote:

> * Jack Steiner <steiner@xxxxxxx> wrote:
>
> > physid_mask_of_physid() causes a huge stack (12k) to be created if
> > the number of APICS is large. Replace physid_mask_of_physid() with a
> > new function that does not create large stacks. This is a problem
> > only on large x86_64 systems.
>
> this indeed fixes the crash i reported here:
>
> http://lkml.org/lkml/2008/6/19/98
>
> so i've added both this and the MAXAPICS patch to tip/x86/uv, and will
> test it some more. Lets hope it goes all well this time :-)

-tip auto-testing found a new boot failure on x86 which happens if
NR_CPUS is changed from 8 to 4096. The hang goes like this:

Linux version 2.6.26-rc7-tip (mingo@dione) (gcc version 4.2.3) #10233 SMP
Tue Jun 24 12:13:46 CEST 2008
[...]
initcall init_mnt_writers+0x0/0x8c returned 0 after 0 msecs
calling eventpoll_init+0x0/0x9a
initcall eventpoll_init+0x0/0x9a returned 0 after 0 msecs
calling anon_inode_init+0x0/0x11a
initcall anon_inode_init+0x0/0x11a returned 0 after 0 msecs
calling pcie_aspm_init+0x0/0x27
initcall pcie_aspm_init+0x0/0x27 returned 0 after 0 msecs
calling acpi_event_init+0x0/0x57
[... hard hang ...]

on a good bootup, it would continue like this:

initcall acpi_event_init+0x0/0x57 returned 0 after 38 msecs
calling pnp_system_init+0x0/0x17
[...]

the config, full bootlog and reproducer bzImage is at:

http://redhat.com/~mingo/misc/config-Tue_Jun_24_07_44_17_CEST_2008.bad
http://redhat.com/~mingo/misc/log-Tue_Jun_24_07_44_17_CEST_2008.bad
http://redhat.com/~mingo/misc/bzImage-Tue_Jun_24_07_44_17_CEST_2008.bad

changing CONFIG_NR_CPUS from 4096 to 8 causes the system to boot up
fine.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/