Re: [6.3][regression] after commit 7170b7ed6acbde523c5d362c8978c60df4c30f30 my system stuck in initramfs forever

From: Thorsten Leemhuis
Date: Mon Feb 27 2023 - 03:21:51 EST


On 26.02.23 08:31, David Gow wrote:
> On Sun, 26 Feb 2023 at 14:02, Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote:
>> On 26.02.23 02:11, David Gow wrote:
>>> On Sat, 25 Feb 2023 at 23:53, Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote:
>>>> On 25.02.23 15:55, Mikhail Gavrilov wrote:
>>>>> On Sat, Feb 25, 2023 at 7:22 PM Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> [CCing the regression list, as it should be in the loop for regressions:
>>>>>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
>>>>>>
>>>>>> On 25.02.23 14:51, Mikhail Gavrilov wrote:
>>>>>>> new kernel release cycle returning with new bugs
>>>>>>> Today my system got stuck in initramfs environment after updating to
>>>>>>> commit d2980d8d826554fa6981d621e569a453787472f8.
>>>>>>>
>>>>>>> I still do not understand how to configure the network inside the
>>>>>>> initramfs environment to grab the logs.
>>>>>>> Since an attempt to rebuild the initramfs with all modules (dracut
>>>>>>> --no-hostonly --force) leads to the stuck initramfs environment and
>>>>>>> impossible entering into initramfs console.
>>>>>>
>>>>>> Do you see any error messages? I have problems since Friday morning as
>>>>>> well (stuck in Fedora's initramfs) and see a lot of BPF warnings like
>>>>>> "BPF: invalid name" and "failed to validate module". Was able to do a
>>>>>> screenshot:
>>>>>>
>>>>>> https://www.leemhuis.info/files/misc/Screenshot_ktst-f36-x86-64_2023-02-24_07:53:14.png
>>>>>
>>>>> I also seen such messages
>>>>> https://freeimage.host/i/img-1475.HMPL26l
>>>>
>>>> Pretty sure that's the same problem, at least the symptoms match. If
>>>> anyone needs a config to reproduce this, here's one of mine that shows
>>>> the problem:
>>>>
>>>> https://www.leemhuis.info/files/misc/config
>>>>
>>>>> P.S.: I also use Fedora Rawhide.
>>>>
>>>> Happens for me on all Fedora 36, 37, and 38 (my rawhide build failed for
>>>> other reasons, so I couldn't test).
>>>
>>> Thanks for the report, and sorry this seems to have broken.
>>>
>>> I've not been able to reproduce this locally yet, but I'm looking into it.
>>>
>>> In the meantime, a few questions if you have time:
>>> - Does this happen with CONFIG_KUNIT=y as well as CONFIG_KUNIT=m?
>>> - Does this patch fix it?
>>> https://lore.kernel.org/linux-kselftest/20230225014529.2259752-1-davidgow@xxxxxxxxxx/T/#u
>>
>> Sorry, limited time and about to leave the house for the day. I only
>> could try the latter and did only do a very quick test, but it seems
>> that patch fixes the issue for me.
>>
>
> Thanks! Glad to hear that patch seems to fix it: we'll try to get it
> upstream as soon as possible, then.

Great. I did some more tests (still not much) earlier today, feel free

Tested-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx>

but I don't care if that doesn't make it.

> I wouldn't worry about testing with CONFIG_KUNIT=y as well at this
> point: I doubt it'll shed any more light on the situation.
>
> I've still been unable to reproduce the issue here, even with a fresh
> install of Fedora Rawhide, and a very recent torvalds/master:

Strange. Wondering if that has something to do with the way Mikhail and
I build the kernel. I'm build using something that very closely
resembles the SRPM used by Fedora.

> [...]
> Given everything else seems fine here, and the makefile issues fixed
> by the above patch both seems to fix this, and is the only real issue
> I could imagine having unpredictable behaviour, I'm reasonably happy
> to consider this "fixed" by that patch. But if this patch _doesn't_
> fix it, or you continue to see some other strange behaviour, we can
> look into fixing it further.

+1

Ciao, Thorsten