Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

From: Tetsuo Handa
Date: Sat Jan 20 2018 - 09:24:23 EST


Michael S. Tsirkin wrote:
> > > > >> + * the page if the vq is full. We are adding one entry each time,
> > > > >> + * which essentially results in no memory allocation, so the
> > > > >> + * GFP_KERNEL flag below can be ignored.
> > > > >> + */
> > > > >> + if (vq->num_free) {
> > > > >> + err = virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL);
> > > > >
> > > > > Should we kick here? At least when ring is close to
> > > > > being full. Kick at half way full?
> > > > > Otherwise it's unlikely ring will
> > > > > ever be cleaned until we finish the scan.
> > > >
> > > > Since this add_one_sg() is called between spin_lock_irqsave(&zone->lock, flags)
> > > > and spin_unlock_irqrestore(&zone->lock, flags), it is not permitted to sleep.
> > >
> > > kick takes a while sometimes but it doesn't sleep.
> >
> > I don't know about virtio. But the purpose of kicking here is to wait for pending data
> > to be flushed in order to increase vq->num_free, isn't it?
>
> It isn't. It's to wake up device out of sleep to make it start
> processing the pending data. If device isn't asleep, it's a nop.

We need to wait until vq->num_free > 0 if vq->num_free == 0 if we want to allow
virtqueue_add_inbuf() to succeed. When will vq->num_free++ be called?

You said virtqueue_kick() is a no-op if the device is not asleep.
Then, there will be no guarantee that we can make vq->num_free > 0
by calling virtqueue_kick(). Are you saying that

virtqueue_kick(vq);
while (!vq->num_free)
virtqueue_get_buf(vq, &unused);
err = virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL);
BUG_ON(err);

sequence from IRQ disabled atomic context is safe? If no, what is
the point with calling virtqueue_kick() when ring is close to being
(half way) full? We can't guarantee that all data is sent to QEMU after all.



Also, why does the cmd id matter? If VIRTIO_BALLOON_F_FREE_PAGE_VQ does not
guarantee the atomicity, I don't see the point of communicating the cmd id
between the QEMU and the guest kernel. Just an EOF marker should be enough.
I do want to see changes for the QEMU side in order to review changes for
the guest kernel side.