RE: [PATCH] Drivers: vmbus: Check for channel allocation before looking up relids

From: Dexuan Cui
Date: Fri Feb 10 2023 - 14:18:34 EST


> From: Mohammed Gamal <mgamal@xxxxxxxxxx>
> Sent: Friday, February 10, 2023 1:12 AM
> > ...
> > Upon crash, Linux sends a CHANNELMSG_UNLOAD messge to the host,
> > and the host is supposed to quiesce/reset the VMBus devices, so
> > normally we should not see a crash in relid2channel().
>
> Does this not happen in the case of kdump? Shouldn't a
> CHANNELMSG_UNLOAD
> message be sent to the host in that case as well?

The message is sent to the host in the case of kdump.

> > > > [ 21.906679] Hardware name: Microsoft Corporation Virtual
> > > > Machine/Virtual Machine, BIOS 090007 05/18/2018
> >
> > I guess you see the crash because you're running an old Hyper-V,
> > probably Windows Server 2016 or 2019, which may be unable to
> > reliably handle the guest's CHANNELMSG_UNLOAD messge.
>
> We've actually seen this on Windows Server 2016, 2019, and 2022.

I didn't expect this to happen to WS 2022. It looks like some of the
VMBus devices are not reset by the host upon the message
CHANNELMSG_UNLOAD. If you can check all the 'relids' in the first
kernel beforehand, and print the 'relid' in relid2channel, we'll be
able to tell which device is not reset. Maybe it's a good idea to print
the 'relid' in the newly-added warning for debug purposes.