Re: [PATCH] virtio_mem: break device on remove

From: David Hildenbrand
Date: Mon Jan 17 2022 - 05:25:21 EST


On 17.01.22 09:40, Michael S. Tsirkin wrote:
> On Mon, Jan 17, 2022 at 09:31:56AM +0100, David Hildenbrand wrote:
>> On 17.01.22 08:55, Michael S. Tsirkin wrote:
>>> On Mon, Jan 17, 2022 at 02:40:11PM +0800, Jason Wang wrote:
>>>>
>>>> 在 2022/1/15 上午5:43, Michael S. Tsirkin 写道:
>>>>> A common pattern for device reset is currently:
>>>>> vdev->config->reset(vdev);
>>>>> .. cleanup ..
>>>>>
>>>>> reset prevents new interrupts from arriving and waits for interrupt
>>>>> handlers to finish.
>>>>>
>>>>> However if - as is common - the handler queues a work request which is
>>>>> flushed during the cleanup stage, we have code adding buffers / trying
>>>>> to get buffers while device is reset. Not good.
>>>>>
>>>>> This was reproduced by running
>>>>> modprobe virtio_console
>>>>> modprobe -r virtio_console
>>>>> in a loop, and this reasoning seems to apply to virtio mem though
>>>>> I could not reproduce it there.
>>>>>
>>>>> Fix this up by calling virtio_break_device + flush before reset.
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
>>>>> ---
>>>>> drivers/virtio/virtio_mem.c | 2 ++
>>>>> 1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>>>>> index 38becd8d578c..33b8a118a3ae 100644
>>>>> --- a/drivers/virtio/virtio_mem.c
>>>>> +++ b/drivers/virtio/virtio_mem.c
>>>>> @@ -2888,6 +2888,8 @@ static void virtio_mem_remove(struct virtio_device *vdev)
>>>>> virtio_mem_deinit_hotplug(vm);
>>>>> /* reset the device and cleanup the queues */
>>>>> + virtio_break_device(vdev);
>>>>> + flush_work(&vm->wq);
>>>>
>>>>
>>>> We set vm->removing to true and call cancel_work_sync() in
>>>> virtio_mem_deinit_hotplug(). Isn't is sufficient?
>>>>
>>>> Thanks
>>>
>>>
>>> Hmm I think you are right. David, I will drop this for now.
>>> Up to you to consider whether some central capability will be
>>> helpful as a replacement for the virtio-mem specific "removing" flag.
>>
>> It's all a bit tricky because we also have to handle pending timers and
>> pending memory onlining/offlining operations in a controlled way. Maybe
>> we could convert to virtio_break_device() and use the
>> &dev->vqs_list_lock as a replacement for the removal_lock. However, I'm
>> not 100% sure if it's nice to use that lock from
>> drivers/virtio/virtio_mem.c directly.
>
> We could add an API if you like. Or maybe it makes sense to add a
> separate one that lets you find out that removal started. Need to figure
> out how to handle suspend too then ...
> Generally we have these checks that device is not going away
> sprinkled over all drivers and I don't like it, but
> it's not easy to build a sane API to handle it, especially
> for high speed things when we can't take locks because performance.

The interesting case might be how to handle virtio_mem_retry(), whereby
we conditionally queue work if !removing.

Having that said, in an ideal world we could deny removal requests for
virtio_mem completely if there is still any memory added to the system ...


--
Thanks,

David / dhildenb