Re: arm64: csdlock at early boot due to slow serial (?)
From: Breno Leitao
Date: Thu Jul 03 2025 - 10:14:23 EST
On Thu, Jul 03, 2025 at 11:28:50AM +0100, Mark Rutland wrote:
> On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote:
> > I'm observing two unusual behaviors during the boot process on my SBSA
> > ARM machine, with upstream kernel (6.16-rc4):
>
> Can you say which SoC in particular that is? Knowing that would help to
> identify whether there's some known erratum, clocking issue, etc.
This is custom made rack mounted machine based on Grace CPU. Here are
some info about the hardware:
# lscpu:
Vendor ID: ARM
Model name: Neoverse-V2
Model: 0
Thread(s) per core: 1
Core(s) per socket: 72
Socket(s): 1
Stepping: r0p0
# /proc/cpuinfo
processor : 71
BogoMIPS : 2000.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd4f
CPU revision : 0
# lshw
description: Rack Mount Chassis
product: <Internal name>
vendor: Quanta
version: <Internal name>
width: 64 bits
capabilities: smbios-3.6.0 dmi-3.6.0 smp sve_default_vector_length tagged_addr_disabled
configuration: boot=normal chassis=rackmount family=Default string sku=Default string uuid=...
How do I find the SoC exactly?
> Likewise that might imply more folk to add to Cc.
>
> [...]
>
> > At timestamp 9.69 seconds, the serial console is still flushing messages from
> > 0.92 seconds, indicating that the initial 9-second gap is spent looping in
> > cpu_relax()-about 20,000 times per message, which is clearly suboptimal.
> >
> > Further debugging revealed the following sequence with the pl011 registers:
> >
> > 1) uart_console_write()
> > 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
> > 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()
> >
> > Michael has reported a hardware issue where the BUSY bit could get
> > stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
> > bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.
>
> Looking at the commit message, that was an issue with the a "custom
> (non-PrimeCell) implementation of the SBSA UART" present on QDF400. I
> assume that was soemthing that Qualcomm Datacenter Technologies designed
> themselves.
>
> It's possible that your SoC has a similar issue with whatever IP block
> is being used as the UART, but the issue in that commit certainly
> doesn't apply to most PL011 / SBSA-UART implementations.
That makes total sense. Decoding SPCR I see the following:
# iasl -d spcr.dat
Intel ACPI Component Architecture
ASL+ Optimizing Compiler/Disassembler version 20210604
Copyright (c) 2000 - 2021 Intel Corporation
File appears to be binary: found 56 non-ASCII characters, disassembling
Binary file appears to be a valid ACPI table, disassembling
Input file spcr.dat, Length 0x50 (80) bytes
ACPI: SPCR 0x0000000000000000 000050 (v02 NVIDIA A M I 00000001 ARMH 00010000)
Acpi Data Table [SPCR] decoded
Formatted output: spcr.dsl - 2624 bytes
Thanks,
--breno