Re: [6.3][regression] after commit 7170b7ed6acbde523c5d362c8978c60df4c30f30 my system stuck in initramfs forever

From: David Gow
Date: Sun Feb 26 2023 - 02:32:12 EST


On Sun, 26 Feb 2023 at 14:02, Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote:
>
> On 26.02.23 02:11, David Gow wrote:
> > On Sat, 25 Feb 2023 at 23:53, Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote:
> >> On 25.02.23 15:55, Mikhail Gavrilov wrote:
> >>> On Sat, Feb 25, 2023 at 7:22 PM Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote:
> >>>>
> >>>> [CCing the regression list, as it should be in the loop for regressions:
> >>>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
> >>>>
> >>>> On 25.02.23 14:51, Mikhail Gavrilov wrote:
> >>>>> new kernel release cycle returning with new bugs
> >>>>> Today my system got stuck in initramfs environment after updating to
> >>>>> commit d2980d8d826554fa6981d621e569a453787472f8.
> >>>>>
> >>>>> I still do not understand how to configure the network inside the
> >>>>> initramfs environment to grab the logs.
> >>>>> Since an attempt to rebuild the initramfs with all modules (dracut
> >>>>> --no-hostonly --force) leads to the stuck initramfs environment and
> >>>>> impossible entering into initramfs console.
> >>>>
> >>>> Do you see any error messages? I have problems since Friday morning as
> >>>> well (stuck in Fedora's initramfs) and see a lot of BPF warnings like
> >>>> "BPF: invalid name" and "failed to validate module". Was able to do a
> >>>> screenshot:
> >>>>
> >>>> https://www.leemhuis.info/files/misc/Screenshot_ktst-f36-x86-64_2023-02-24_07:53:14.png
> >>>
> >>> I also seen such messages
> >>> https://freeimage.host/i/img-1475.HMPL26l
> >>
> >> Pretty sure that's the same problem, at least the symptoms match. If
> >> anyone needs a config to reproduce this, here's one of mine that shows
> >> the problem:
> >>
> >> https://www.leemhuis.info/files/misc/config
> >>
> >>> P.S.: I also use Fedora Rawhide.
> >>
> >> Happens for me on all Fedora 36, 37, and 38 (my rawhide build failed for
> >> other reasons, so I couldn't test).
> >
> > Thanks for the report, and sorry this seems to have broken.
> >
> > I've not been able to reproduce this locally yet, but I'm looking into it.
> >
> > In the meantime, a few questions if you have time:
> > - Does this happen with CONFIG_KUNIT=y as well as CONFIG_KUNIT=m?
> > - Does this patch fix it?
> > https://lore.kernel.org/linux-kselftest/20230225014529.2259752-1-davidgow@xxxxxxxxxx/T/#u
>
> Sorry, limited time and about to leave the house for the day. I only
> could try the latter and did only do a very quick test, but it seems
> that patch fixes the issue for me.
>

Thanks! Glad to hear that patch seems to fix it: we'll try to get it
upstream as soon as possible, then.

I wouldn't worry about testing with CONFIG_KUNIT=y as well at this
point: I doubt it'll shed any more light on the situation.

I've still been unable to reproduce the issue here, even with a fresh
install of Fedora Rawhide, and a very recent torvalds/master:
1ec35eadc3b4 ("Merge tag 'clk-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux")[1].
I even tried plugging in a Keychron C2 keyboard (which also uses
hid_apple), as well as running several KUnit tests which make use of
the hooks functionality: everything worked fine.

Given everything else seems fine here, and the makefile issues fixed
by the above patch both seems to fix this, and is the only real issue
I could imagine having unpredictable behaviour, I'm reasonably happy
to consider this "fixed" by that patch. But if this patch _doesn't_
fix it, or you continue to see some other strange behaviour, we can
look into fixing it further.

Thanks again,

-- David

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1ec35eadc3b448c91a6b763371a7073444e95f9d

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature