Re: [PATCH RFC v2 1/4] virtio: fix up virtio_disable_cb

From: Willem de Bruijn
Date: Tue Apr 13 2021 - 17:45:33 EST


On Tue, Apr 13, 2021 at 3:54 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
>
> On Tue, Apr 13, 2021 at 10:01:11AM -0400, Willem de Bruijn wrote:
> > On Tue, Apr 13, 2021 at 1:47 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> > >
> > > virtio_disable_cb is currently a nop for split ring with event index.
> > > This is because it used to be always called from a callback when we know
> > > device won't trigger more events until we update the index. However,
> > > now that we run with interrupts enabled a lot we also poll without a
> > > callback so that is different: disabling callbacks will help reduce the
> > > number of spurious interrupts.
> >
> > The device may poll for transmit completions as a result of an interrupt
> > from virtnet_poll_tx.
> >
> > As well as asynchronously to this transmit interrupt, from start_xmit or
> > from virtnet_poll_cleantx as a result of a receive interrupt.
> >
> > As of napi-tx, transmit interrupts are left enabled to operate in standard
> > napi mode. While previously they would be left disabled for most of the
> > time, enabling only when the queue as low on descriptors.
> >
> > (in practice, for the at the time common case of split ring with event index,
> > little changed, as that mode does not actually enable/disable the interrupt,
> > but looks at the consumer index in the ring to decide whether to interrupt)
> >
> > Combined, this may cause the following:
> >
> > 1. device sends a packet and fires transmit interrupt
> > 2. driver cleans interrupts using virtnet_poll_cleantx
> > 3. driver handles transmit interrupt using vring_interrupt,
> > detects that the vring is empty: !more_used(vq),
> > and records a spurious interrupt.
> >
> > I don't quite follow how suppressing interrupt suppression, i.e.,
> > skipping disable_cb, helps avoid this.
> > I'm probably missing something. Is this solving a subtly different
> > problem from the one as I understand it?
>
> I was thinking of this one:
>
> 1. device is sending packets
> 2. driver cleans them at the same time using virtnet_poll_cleantx
> 3. device fires transmit interrupts
> 4. driver handles transmit interrupts using vring_interrupt,
> detects that the vring is empty: !more_used(vq),
> and records spurious interrupts.

I think that's the same scenario

>
>
> but even yours is also fixed I think.
>
> The common point is that a single spurious interrupt is not a problem.
> The problem only exists if there are tons of spurious interrupts with no
> real ones. For this to trigger, we keep polling the ring and while we do
> device keeps firing interrupts. So just disable interrupts while we
> poll.

But the main change in this patch is to turn some virtqueue_disable_cb
calls into no-ops. I don't understand how that helps reduce spurious
interrupts, as if anything, it keeps interrupts enabled for longer.

Another patch in the series disable callbacks* before starting to
clean the descriptors from the rx interrupt. That I do understand will
suppress additional tx interrupts that might see no work to be done. I
just don't entire follow this patch on its own.

*(I use interrupt and callback as a synonym in this context, correct
me if I'm glancing over something essential)