Re: not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4)

From: William Lee Irwin III
Date: Mon Nov 24 2003 - 17:58:13 EST


On Mon, 2003-11-24 at 02:00, William Lee Irwin III wrote:
>> A similar (but more elaborate) fix is in 2.6.

On Mon, Nov 24, 2003 at 11:49:18AM -0500, Len Brown wrote:
> Why is the additional variable "kicked" in 2.6 necessary?
> Appears that kicked == (cpucount + 1), and the loop already
> compares that to NR_CPUS via max_cpus:
> if (max_cpus <= cpucount+1)
> continue;
> Though I think it would read more clearly this way:
> if (cpucount + 1 >= max_cpus)
> break;
> Speaking of max_cpus, it would probably be a good thing if maxcpus() did
> not allow the administrator to set max_cpus > NR_CPUS at boot time.

There's some kind of shenanigan going on with cpucount I can't be arsed
to decipher where it's incremented and decremented all over the place;
do_boot_cpu() returns status, so counting successes got the bug fixed.
Fixing max_cpus > NR_CPUS has some kind of core impact. So I've pretty
much punted on both killing kicked and fixing max_cpus. I anticipate
a particular someone I don't want to hear from complaining loudly.

Maybe something like the below (untested) would be helpful.


-- wli


Propagate kicked down to do_boot_cpu() and nuke the redundant cpucount.
The choice of which to nuke was based on cpucount being modified all
over the place and actually being 1 less than all cpus, vs. kicked being
a counter maintained all in one place and actually representing the
total number of cpus. Old semantics are AFAICT preserved apart from
terminating the wakeup loop at kicked < max(NR_CPUS, max_cpus), which is
actually a darn good idea since there's no reason to fiddle with the
rest of the APIC ID space once we've got all the cpus we're allowed to.


diff -urpN linux-2.6.0-test9/arch/i386/kernel/smpboot.c maxcpus-2.6.0-test9/arch/i386/kernel/smpboot.c
--- linux-2.6.0-test9/arch/i386/kernel/smpboot.c 2003-10-25 11:43:36.000000000 -0700
+++ maxcpus-2.6.0-test9/arch/i386/kernel/smpboot.c 2003-11-24 14:44:50.000000000 -0800
@@ -426,8 +426,6 @@ void __init smp_callin(void)
synchronize_tsc_ap();
}

-int cpucount;
-
extern int cpu_idle(void);

/*
@@ -772,7 +770,7 @@ wakeup_secondary_cpu(int phys_apicid, un

extern cpumask_t cpu_initialized;

-static int __init do_boot_cpu(int apicid)
+static int __init do_boot_cpu(int apicid, int cpu)
/*
* NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
* (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -781,11 +779,10 @@ static int __init do_boot_cpu(int apicid
{
struct task_struct *idle;
unsigned long boot_error;
- int timeout, cpu;
+ int timeout;
unsigned long start_eip;
unsigned short nmi_high = 0, nmi_low = 0;

- cpu = ++cpucount;
/*
* We can't use kernel_thread since we must avoid to
* reschedule the child.
@@ -871,7 +868,6 @@ static int __init do_boot_cpu(int apicid
unmap_cpu_to_logical_apicid(cpu);
cpu_clear(cpu, cpu_callout_map); /* was set here (do_boot_cpu()) */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
- cpucount--;
}

/* mark "stuck" area as not stuck */
@@ -1021,7 +1017,7 @@ static void __init smp_boot_cpus(unsigne
Dprintk("CPU present map: %lx\n", physids_coerce(phys_cpu_present_map));

kicked = 1;
- for (bit = 0; kicked < NR_CPUS && bit < MAX_APICS; bit++) {
+ for (bit = 0; kicked < min(NR_CPUS, maxcpus) && bit < MAX_APICS; bit++) {
apicid = cpu_present_to_apicid(bit);
/*
* Don't even attempt to start the boot CPU!
@@ -1031,10 +1027,7 @@ static void __init smp_boot_cpus(unsigne

if (!check_apicid_present(bit))
continue;
- if (max_cpus <= cpucount+1)
- continue;
-
- if (do_boot_cpu(apicid))
+ if (do_boot_cpu(apicid, kicked))
printk("CPU #%d not responding - cannot use it.\n",
apicid);
else
@@ -1055,7 +1048,7 @@ static void __init smp_boot_cpus(unsigne
bogosum += cpu_data[cpu].loops_per_jiffy;
printk(KERN_INFO
"Total of %d processors activated (%lu.%02lu BogoMIPS).\n",
- cpucount+1,
+ kicked,
bogosum/(500000/HZ),
(bogosum/(5000/HZ))%100);

@@ -1069,7 +1062,7 @@ static void __init smp_boot_cpus(unsigne
* approved Athlon
*/
if (tainted & TAINT_UNSAFE_SMP) {
- if (cpucount)
+ if (kicked > 1)
printk (KERN_INFO "WARNING: This combination of AMD processors is not suitable for SMP.\n");
else
tainted &= ~TAINT_UNSAFE_SMP;
@@ -1113,7 +1106,7 @@ static void __init smp_boot_cpus(unsigne
/*
* Synchronize the TSC with the AP
*/
- if (cpu_has_tsc && cpucount && cpu_khz)
+ if (cpu_has_tsc && kicked > 1 && cpu_khz)
synchronize_tsc_bp();
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/