Re: [PATCH] powerpc/kdump: fix kdump kernel hangup issue with hot add CPUs

From: Sourabh Jain
Date: Fri Apr 16 2021 - 07:28:09 EST



On 16/04/21 3:03 pm, Hari Bathini wrote:


On 16/04/21 12:17 pm, Sourabh Jain wrote:
With the kexec_file_load system call when system crashes on the hot add
CPU the capture kernel hangs and failed to collect the vmcore.

  Kernel panic - not syncing: sysrq triggered crash
  CPU: 24 PID: 6065 Comm: echo Kdump: loaded Not tainted 5.12.0-rc5upstream #54
  Call Trace:
  [c0000000e590fac0] [c0000000007b2400] dump_stack+0xc4/0x114 (unreliable)
  [c0000000e590fb00] [c000000000145290] panic+0x16c/0x41c
  [c0000000e590fba0] [c0000000008892e0] sysrq_handle_crash+0x30/0x40
  [c0000000e590fc00] [c000000000889cdc] __handle_sysrq+0xcc/0x1f0
  [c0000000e590fca0] [c00000000088a538] write_sysrq_trigger+0xd8/0x178
  [c0000000e590fce0] [c0000000005e9b7c] proc_reg_write+0x10c/0x1b0
  [c0000000e590fd10] [c0000000004f26d0] vfs_write+0xf0/0x330
  [c0000000e590fd60] [c0000000004f2aec] ksys_write+0x7c/0x140
  [c0000000e590fdb0] [c000000000031ee0] system_call_exception+0x150/0x290
  [c0000000e590fe10] [c00000000000ca5c] system_call_common+0xec/0x278
  --- interrupt: c00 at 0x7fff905b9664
  NIP:  00007fff905b9664 LR: 00007fff905320c4 CTR: 0000000000000000
  REGS: c0000000e590fe80 TRAP: 0c00   Not tainted (5.12.0-rc5upstream)
  MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28000242
        XER: 00000000
  IRQMASK: 0
  GPR00: 0000000000000004 00007ffff5fedf30 00007fff906a7300 0000000000000001
  GPR04: 000001002a7355b0 0000000000000002 0000000000000001 00007ffff5fef616
  GPR08: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
  GPR12: 0000000000000000 00007fff9073a160 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 00007fff906a4ee0 0000000000000002 0000000000000001
  GPR24: 00007fff906a0898 0000000000000000 0000000000000002 000001002a7355b0
  GPR28: 0000000000000002 00007fff906a1790 000001002a7355b0 0000000000000002
  NIP [00007fff905b9664] 0x7fff905b9664
  LR [00007fff905320c4] 0x7fff905320c4
  --- interrupt: c00

<SNIP>

I will update the commit message.

  /**
   * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel
   *                       being loaded.
@@ -1020,6 +1113,13 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
          }
      }
  +    /* Update cpus nodes information to account hotplug CPUs. */
+    if (image->type == KEXEC_TYPE_CRASH) {

Shouldn't this apply to regular kexec_file_load case as well? Yeah, there won't be a hang in regular kexec_file_load case but for correctness, that kernel should also not see stale CPU info in FDT?

Yes better to update the fdt for both kexec and kdump.

Thanks for the review Hari.

- Sourabh Jain