Re: [External] Re: [PATCH v7 0/9] Parallel CPU bringup for x86_64

From: Usama Arif
Date: Thu Feb 09 2023 - 04:49:35 EST




On 09/02/2023 03:53, Paul E. McKenney wrote:
On Tue, Feb 07, 2023 at 11:04:27PM +0000, Usama Arif wrote:
Tested on v7, doing INIT/SIPI/SIPI in parallel brings down the time for
smpboot from ~700ms to 100ms (85% improvement) on a server with 128 CPUs
split across 2 NUMA nodes.

The major change over v6 is keeping parallel smp support enabled in AMD.
APIC ID for parallel CPU bringup is now obtained from CPUID leaf 0x0B
(for x2APIC mode) otherwise CPUID leaf 0x1 (8 bits).

The patch for reusing timer calibration for secondary CPUs is also removed
from the series as its not part of parallel smp bringup and needs to be
further thought about.

Running rcutorture on this got me the following NULL pointer dereference
on scenario TREE01:

------------------------------------------------------------------------

[   34.662066] smpboot: CPU 0 is now offline
[   34.674075] rcu: NOCB: Cannot CB-offload offline CPU 25
[   35.038003] rcu: De-offloading 5
[   35.112997] rcu: Offloading 12
[   35.716011] smpboot: Booting Node 0 Processor 0 APIC 0x0
[   35.762685] BUG: kernel NULL pointer dereference, address: 0000000000000001
[   35.764278] #PF: supervisor instruction fetch in kernel mode
[   35.765530] #PF: error_code(0x0010) - not-present page
[   35.766700] PGD 0 P4D 0
[   35.767278] Oops: 0010 [#1] PREEMPT SMP PTI
[   35.768223] CPU: 36 PID: 0 Comm: swapper/36 Not tainted 6.2.0-rc1-00206-g18a37610b632-dirty #3563
[   35.770201] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

------------------------------------------------------------------------

Given an x86 system with KVM and qemu, this can be reproduced by running
the following from the top-level directory in the Linux-kernel source
tree:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs "TREE01 TINY01" --trust-make

Out of 15 runs, 14 blew up just after the first attempt to bring CPU
0 back online. The 15th run blew up just after the second attempt to
bring CPU 0 online, the first attempt having succeeded.

My guess is that the CONFIG_BOOTPARAM_HOTPLUG_CPU0=y Kconfig option is
tickling this bug. This Kconfig option has been added to the TREE01
scenario in the -rcu tree's "dev" branch, which might mean that this test
would pass on mainline. But CONFIG_BOOTPARAM_HOTPLUG_CPU0=y is not new,
only rcutorture's testing of it.

Thoughts?

Thanx, Paul

It looks like its because of the initial_gs, initial_stack and
early_gdt_descr not being setup properly for CPU0 hotplug, i.e. init_cpu_data isnt called in cpu0 hotplug case.

Its easy to test, just by doing
echo 0 > /sys/devices/system/cpu/cpu0/online;
echo 1 > /sys/devices/system/cpu/cpu0/online;

As a quick check, if we do something like below (probably there is a much better place to set these..), the above hotplug commands will work.

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3ec5182d9698..184135c47ee5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1080,6 +1080,7 @@ wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
wakeup_cpu0_nmi, 0, "wake_cpu0");

if (!boot_error) {
+ initial_gs = per_cpu_offset(cpu);
enable_start_cpu0 = 1;
*cpu0_nmi_registered = 1;
id = apic->dest_mode_logical ? cpu0_logical_apicid : apicid;
@@ -1188,10 +1189,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
boot_error = apic->wakeup_secondary_cpu_64(apicid, start_ip);
else if (apic->wakeup_secondary_cpu)
boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
- else
+ else {
+ if(!cpu) {
+ early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
+ initial_stack = idle->thread.sp;
+ }
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
cpu0_nmi_registered);
-
+ }
return boot_error;
}