Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64

From: Darren Hart
Date: Mon Mar 20 2023 - 14:08:33 EST


On Sat, Mar 18, 2023 at 11:35:44AM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 23:28, Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Mar 16, 2023 at 07:55:36PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 18:52, Andrea Righi <andrea.righi@xxxxxxxxxxxxx> wrote:
> ...
> > > >
> > > > Yay! Success! I just tested your latest efi/urgent (with the fixup) and
> > > > system completed the boot without any soft lockups.
> > > >
> > >
> > > Thanks for confirming. I'll take that as a tested-by
> >
> > The solution in the current branch looks like the best approach we have to date
> > to address the broadest of affected systems. We could switch the eMAG test to an
> > MIDR test I believe (but this won't work for Altra as that would capture all the
> > Neoverse v1 cores beyond Altra). I can look into the MIDR test if you think it's
> > worthwhile - but since I don't think we can eliminate the SMBIOS string test, it
> > doesn't buy us much since we don't need a greedier eMAG test (there aren't more
> > of them to match).
> >
> > Given that some OEM Altra platforms change the processor ID, I don't see a
> > better solution currently than adding their the "product name" to the smbios
> > string tests unfortunately.
> >
>
> Indeed. I spotted a Gigabyte system [0] with a different processor ID,
> but with a version we can test for.
>
> So for now, I'll go with
>
> socid = (u32 *)record->processor_id;
> switch (*socid & 0xffff000f) {
> static char const altra[] = "Ampere(TM) Altra(TM) Processor";
> static char const emag[] = "eMAG";
> default:
> version = efi_get_smbios_string(&record->header, 4,
> processor_version);
> if (!version || (strncmp(version, altra, sizeof(altra) - 1) &&
> strncmp(version, emag, sizeof(emag) - 1)))
> break;
>
> fallthrough;
>
> case 0x0a160001: // Altra
> case 0x0a160002: // Altra Max
> efi_warn("Working around broken SetVirtualAddressMap()\n");
> ...
>
> which should cover all the affected systems we encountered so far.
>
> I'll push this to linux-next to let it soak for a little bit, and then
> send it to Linus somewhere during the week

Thank you Ard, I think this is our best option.

--
Darren Hart
Ampere Computing / OS and Kernel