RE: [PATCH v4 net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

From: Dexuan Cui
Date: Mon Apr 12 2021 - 16:30:34 EST


> From: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> Sent: Monday, April 12, 2021 7:40 AM
> > From: Andrew Lunn <andrew@xxxxxxx>
> > Sent: Monday, April 12, 2021 8:32 AM
> > > ...
> > > + /* At most num_online_cpus() + 1 interrupts are used. */
> > > + msix_index = queue->eq.msix_index;
> > > + if (WARN_ON(msix_index > num_online_cpus()))
> > > + return;
> >
> > Do you handle hot{un}plug of CPUs?
> We don't have hot{un}plug of CPU feature yet.

Hyper-V doesn't support vCPU hot plug/unplug for VMs running on it,
but potentially the VM is able to offline and online its virtual CPUs,
though this is not a typical usage at all in production system (to offline
a vCPU, the VM also needs to unbind all the para-virtualized devices'
vmbus channels from that vCPU, which is kind of undoable in a VM
that has less than about 32 vCPUs, because typically all the vCPUs are
bound to one of the vmbus channels of the NetVSC device(s) and
StorVSC device(s), and can't be (easily) unbound.

So I think the driver does need to handle cpu online/offlining properly,
but I think we can do that in some future patch, because IMO that would
need more careful consideration. IMO here the WARN_ON() is good as
it at least lets us know something unexpected (if any) happens.

> > > +static void mana_hwc_init_event_handler(void *ctx, struct gdma_queue
> > *q_self,
> > > + struct gdma_event *event)
> > > +{
> > > + struct hw_channel_context *hwc = ctx;
> > > + struct gdma_dev *gd = hwc->gdma_dev;
> > > + union hwc_init_type_data type_data;
> > > + union hwc_init_eq_id_db eq_db;
> > > + u32 type, val;
> > > +
> > > + switch (event->type) {
> > > + case GDMA_EQE_HWC_INIT_EQ_ID_DB:
> > > + eq_db.as_uint32 = event->details[0];
> > > + hwc->cq->gdma_eq->id = eq_db.eq_id;
> > > + gd->doorbell = eq_db.doorbell;
> > > + break;
> > > +
> > > + case GDMA_EQE_HWC_INIT_DATA:
> > > +
> > > + type_data.as_uint32 = event->details[0];
> > > +
> > > + case GDMA_EQE_HWC_INIT_DONE:
> > > + complete(&hwc->hwc_init_eqe_comp);
> > > + break;
> >
> > ...
> >
> > > + default:
> > > + WARN_ON(1);
> > > + break;
> > > + }
> >
> > Are these events from the firmware? If you have newer firmware with an
> > older driver, are you going to spam the kernel log with WARN_ON dumps?
> For protocol version mismatch, the host and guest will either negotiate the
> highest common version, or fail to probe. So this kind of warnings are not
> expected.

I agree, but I think we'd better remove the WARN_ON(1), which was mainly
for debugging purposem, and was added just in case.

Thanks,
Dexuan