Re: [PATCHv2] vgaarb: Add module param to allow for choosing the boot VGA device

From: Cal Peake
Date: Tue Jul 05 2022 - 16:42:40 EST


On Tue, 5 Jul 2022, Alex Williamson wrote:

> > > + ret = sscanf(input, "%x:%x.%x", &bus, &dev, &func);
> > > + if (ret != 3) {
> > > + pr_warn("Improperly formatted PCI ID: %s\n", input);
> > > + return;
> > > + }
>
> See pci_dev_str_match()

Hi Alex, thanks for the feedback. I'll add this if we wind up going with
some version of my patch.

> > > + if (boot_vga && boot_vga->is_chosen_one)
> > > + return false;
> > > +
> > > + if (bootdev_id == PCI_DEVID(pdev->bus->number, pdev->devfn)) {
> > > + vgadev->is_chosen_one = true;
> > > + return true;
> > > + }
>
> This seems too simplistic, for example PCI code determines whether the
> ROM is a shadow ROM at 0xc0000 based on whether it's the
> vga_default_device() where that default device is set in
> vga_arbiter_add_pci_device() based on the value returned by
> this vga_is_boot_device() function. A user wishing to specify the boot
> VGA device doesn't magically make that device's ROM shadowed into this
> location.
>

I think I understand what you're saying. We're not telling the system what
the boot device is, it's telling us?

> I also don't see how this actually enables VGA routing to the user
> selected device, where we generally expect the boot device already has
> this enabled.
>
> Furthermore, what's the initialization state of the selected device, if
> it has not had its option ROM executed, is it necessarily in a state to
> accept VGA commands? If we're changing the default VGA device, are we
> fully uncoupling from any firmware notions of the console device?
> Thanks,

Unfortunately, I'm not the best qualified to answer these questions. My
understanding is mostly surface-level until I start digging into the code.

I think the answer to most of them though might be that the UEFI firmware
initializes both cards.

During POST, I do get output on both GPUs. One gets the static BIOS text
(Copyright AMI etc.) -- this is the one selected as boot device -- and the
other gets the POST-code counting up.

Once the firmware hands off to the bootloader, whichever GPU has the
active display (both GPUs go to the same display, the input source gets
switched depending on whether I'm using the host or the VM) is where
the bootloader output is.

When the bootloader hands off to the kernel, the boot device chosen by the
firmware gets the kernel output. If that's the host GPU, then everything
is fine.

If that's the VM GPU, then it gets the kernel output until the vfio-pci
driver loads and then all output stops. Back on the host GPU, the screen
is black until the X server spawns[1] but I get no VTs.

With my patch, telling the arbiter that the host GPU is always the boot
device results in everything just working.

With all that said, if you feel this isn't the right way to go, do you
have any thoughts on what would be a better path to try?

Thanks,

--
Cal Peake

[1] I said in a previous email that this only happened when I set
VGA_ARB_MAX_GPUS=1, but after doing some more testing just now, it seems I
was wrong and the X server was just taking longer than expected to load.