Re: arm64 syzbot instances

From: Dmitry Vyukov
Date: Fri Mar 12 2021 - 05:40:10 EST


On Fri, Mar 12, 2021 at 11:11 AM Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > > On Fri, Mar 12, 2021 at 9:46 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > > > On Fri, Mar 12, 2021 at 9:40 AM Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > > > > On Thu, Mar 11, 2021 at 6:57 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > > > > a) accessing a legacy ISA/LPC port should not result in an oops,
> > > > > but should instead return values with all bits set. There could
> > > > > be a ratelimited console warning about broken drivers, but we
> > > > > can't assume that all drivers work correctly, as some ancient
> > > > > PC style drivers still rely on this.
> > > > > John Garry has recently worked on a related bugfix, so maybe
> > > > > either this is the same bug he encountered (and hasn't merged
> > > > > yet), or if his fix got merged there is still a remaining problem.
> > >
> > > > > b) It should not be possible to open /dev/ttyS3 if the device is
> > > > > not initialized. What is the output of 'cat /proc/tty/driver/serial'
> > > > > on this machine? Do you see any messages from the serial
> > > > > driver in the boot log?
> > > > > Unfortunately there are so many different ways to probe devices
> > > > > in the 8250 driver that I don't know where this comes from.
> > > > > Your config file has
> > > > > CONFIG_SERIAL_8250_PNP=y
> > > > > CONFIG_SERIAL_8250_NR_UARTS=32
> > > > > CONFIG_SERIAL_8250_RUNTIME_UARTS=4
> > > > > CONFIG_SERIAL_8250_EXTENDED=y
> > > > > I guess it's probably the preconfigured uarts that somehow
> > > > > become probed without initialization, but it could also be
> > > > > an explicit device incorrectly described by qemu.
> > > >
> > > >
> > > > Here is fool boot log, /proc/tty/driver/serial and the crash:
> > > > https://gist.githubusercontent.com/dvyukov/084890d9b4aa7cd54f468e652a9b5881/raw/54c12248ff6a4885ba6c530d56b3adad59bc6187/gistfile1.txt
> > >
> > > Ok, so there are four 8250 ports, and none of them are initialized,
> > > while the console is on /dev/ttyAMA0 using a different driver.
> > >
> > > I'm fairly sure this is a bug in the kernel then, not in qemu.
> > >
> > >
> > > I also see that the PCI I/O space gets mapped to a physical address:
> > > [ 3.974309][ T1] pci-host-generic 4010000000.pcie: IO
> > > 0x003eff0000..0x003effffff -> 0x0000000000
> > >
> > > So it's probably qemu that triggers the 'synchronous external
> > > abort' when accessing the PCI I/O space, which in turn hints
> > > towards a bug in qemu. Presumably it only returns data from
> > > I/O ports that are actually mapped to a device when real hardware
> > > is supposed to return 0xffffffff when reading from unused I/O ports.
> > > This would be separate from the work that John did, which only
> > > fixed the kernel for accessing I/O port ranges that do not have
> > > a corresponding MMU mapping to hardware ports.
> >
> > Will John's patch fix this crash w/o any changes in qemu? That would
> > be good enough for syzbot. Otherwise we need to report the issue to
> > qemu.
>
> No, this was a third issue. As far as I remember, this would result in
> a similar problem in the case where there is no PCI bus at all, or
> where no PCI host has an I/O port range, so the inb() from the serial
> driver would cause a page fault. The problem you ran into happens
> in qemu when the PCI I/O ports are mapped to hardware registers
> that cause an exception when accessed.
>
> If you just want to work around the problem for now, it should
> go away if you set CONFIG_SERIAL_8250_RUNTIME_UARTS
> to zero.

It does not happen too often on syzbot so far, so let's try to do the
right thing first.
I've filed: https://bugs.launchpad.net/qemu/+bug/1918917
with a link to this thread. To be fair, I don't fully understand what
I am talking about, I hope I proxied your description properly.