Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM

From: David Hildenbrand
Date: Mon Mar 09 2020 - 07:00:09 EST


On 09.03.20 11:14, Michael S. Tsirkin wrote:
> On Mon, Mar 09, 2020 at 10:03:14AM +0100, David Hildenbrand wrote:
>> On 08.03.20 05:47, Tyler Sanderson wrote:
>>> Tested-by: Tyler Sanderson <tysand@xxxxxxxxxx>
>>>
>>> Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42
>>> GB file full of random bytes that we continually cat to /dev/null.
>>> This fills the page cache as the file is read. Meanwhile we trigger
>>> the balloon to inflate, with a target size of 53 GB. This setup causes
>>> the balloon inflation to pressure the page cache as the page cache is
>>> also trying to grow. Afterwards we shrink the balloon back to zero (so
>>> total deflate = total inflate).
>>>
>>> Without patch (kernel 4.19.0-5):
>>> Inflation never reaches the target until we stop the "cat file >
>>> /dev/null" process. Total inflation time was 542 seconds. The longest
>>> period that made no net forward progress was 315 seconds (see attached
>>> graph).
>>> Result of "grep balloon /proc/vmstat" after the test:
>>> balloon_inflate 154828377
>>> balloon_deflate 154828377
>>>
>>> With patch (kernel 5.6.0-rc4+):
>>> Total inflation duration was 63 seconds. No deflate-queue activity
>>> occurs when pressuring the page-cache.
>>> Result of "grep balloon /proc/vmstat" after the test:
>>> balloon_inflate 12968539
>>> balloon_deflate 12968539
>>>
>>> Conclusion: This patch fixes the issue. In the test it reduced
>>> inflate/deflate activity by 12x, and reduced inflation time by 8.6x.
>>> But more importantly, if we hadn't killed the "grep balloon
>>> /proc/vmstat" process then, without the patch, the inflation process
>>> would never reach the target.
>>>
>>> Attached is a png of a graph showing the problematic behavior without
>>> this patch. It shows deflate-queue activity increasing linearly while
>>> balloon size stays constant over the course of more than 8 minutes of
>>> the test.
>>
>> Thanks a lot for the extended test!
>
>
> Given we shipped this for a long time, I think the best way
> to make progress is to merge 1/3, 2/3 right now, and 3/3
> in the next release.

Agreed.

--
Thanks,

David / dhildenb