Re: Kernel 5.15 doesn't detect SATA drive on boot

From: Krzysztof Wilczyński
Date: Tue Nov 16 2021 - 18:26:23 EST


[+CC Arnd, Bjorn, Marc and Sasha for visibility]

Hello Damien and Yuji,

[...]
> > I'm using Arch Linux on MacBook Air 2010. I updated `linux` package[1]
> > from v5.14.16 to v5.15.2 the other day, and the boot process stalled
> > with the following message.
> >
> > ```shell
> > :: running early hook [udev]
> > Starting version 249.6-3-arch
> > :: running hook [udev]
> > :: Triggering uevents...
> > Waiting 10 seconds for device /dev/sda3 ...
> > ERROR: device '/dev/sda3' not found. Skipping fsck.
> > :: mounting '/dev/sda' on real root
> > mount: /new_root: no filesystem type specified.
> > You are now being dropped into an emergency shell.
> > sh: can't access tty; job control turned off
> > [rootfs ]#
> > ```
> >
> > In the emergency shell there's no `sda` devices when I type `$ ls
> > /dev/`. By downgrading the kernel, boot process works properly.
> >
> > See also Arch Linux bug tracker[2]. There are similar reports on
> > Apple devices.
> >
> > `dmesg` output in the emergency shell is attached. I guess this issue is
> > related to libata, so CCed to Damien Le Moal.
>
> I think that this problem is due to recent PCI subsystem changes which broke Mac
> support. The problem show up as the interrupts not being delivered, which in
> turn result in the kernel assuming that the drive is not working (see the
> timeout error messages in your dmesg output). Hence your boot drive detection
> fails and no rootfs to mount.
>
> Adding linux-pci list.
>
>
>
> >
> > Regards.
> >
> > [1] https://archlinux.org/packages/core/x86_64/linux/
> > [2] https://bugs.archlinux.org/task/72734

The error in the dmesg output (see [2] where the log file is attached)
looks similar to the problem reported a week or so ago, as per:

https://lore.kernel.org/linux-pci/ee3884db-da17-39e3-4010-bcc8f878e2f6@xxxxxxxxxxx/

The problematic commits where reverted by Bjorn and the Pull Request that
did it was accepted, as per:

https://lore.kernel.org/linux-pci/20211111195040.GA1345641@bhelgaas/

Thus, this would made its way into 5.16-rc1, I suppose. We might have to
back-port this to the stable and long-term kernels.

Yuji, could you, if you have some time to spare, try the 5.16-rc1 to see if
this have gotten better on your system?

Krzysztof