Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64

From: Linux regression tracking #adding (Thorsten Leemhuis)
Date: Thu Mar 16 2023 - 05:45:17 EST


[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 16.03.23 08:54, Andrea Righi wrote:
> Hello,
>
> the latest v6.2.6 kernel fails to boot on some arm64 systems, the kernel
> gets stuck and never completes the boot. On the console I see this:
>
> [ 72.043484] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 72.049571] rcu: 22-...0: (30 GPs behind) idle=b10c/1/0x4000000000000000 softirq=164/164 fqs=6443
> [ 72.058520] (detected by 28, t=15005 jiffies, g=449, q=174 ncpus=32)
> [ 72.064949] Task dump for CPU 22:
> [ 72.068251] task:kworker/u64:5 state:R running task stack:0 pid:447 ppid:2 flags:0x0000000a
> [ 72.078156] Workqueue: efi_rts_wq efi_call_rts
> [ 72.082595] Call trace:
> [ 72.085029] __switch_to+0xbc/0x100
> [ 72.088508] 0xffff80000fe83d4c
>
> After that, as a consequence, I start to get a lot of hung task timeout traces.
>
> I tried to bisect the problem and I found that the offending commit is
> this one:
>
> e7b813b32a42 ("efi: random: refresh non-volatile random seed when RNG is initialized")
>
> I've reverted this commit for now and everything works just fine, but I
> was wondering if the problem could be caused by a lack of entropy on
> these arm64 boxes or something else.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e7b813b32a42
#regzbot title efi: stuck at boot (efi_call_rts) on arm64
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.