[PATCH 1/1] Call MADV_HUGEPAGE for guest RAM allocations

From: Luiz Capitulino
Date: Fri Oct 05 2012 - 15:47:57 EST


This makes it possible for QEMU to use transparent huge pages (THP)
when transparent_hugepage/enabled=madvise. Otherwise THP is only
used when it's enabled system wide.

Signed-off-by: Luiz Capitulino <lcapitulino@xxxxxxxxxx>
Signed-off-by: Anthony Liguori <aliguori@xxxxxxxxxx>
Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
---
exec.c | 1 +
osdep.h | 5 +++++
2 files changed, 6 insertions(+)

diff --git a/exec.c b/exec.c
index c4ed6fdef1..750008c4e1 100644
--- a/exec.c
+++ b/exec.c
@@ -2571,6 +2571,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
cpu_physical_memory_set_dirty_range(new_block->offset, size, 0xff);

qemu_ram_setup_dump(new_block->host, size);
+ qemu_madvise(new_block->host, size, QEMU_MADV_HUGEPAGE);

if (kvm_enabled())
kvm_setup_guest_memory(new_block->host, size);
diff --git a/osdep.h b/osdep.h
index cb213e0295..c5fd3d91ff 100644
--- a/osdep.h
+++ b/osdep.h
@@ -108,6 +108,11 @@ void qemu_vfree(void *ptr);
#else
#define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
#endif
+#ifdef MADV_HUGEPAGE
+#define QEMU_MADV_HUGEPAGE MADV_HUGEPAGE
+#else
+#define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
+#endif

#elif defined(CONFIG_POSIX_MADVISE)

> Please: if we wish to change behavior from February 2015, let's extend the
> API to allow for remote allocations in several of the ways we have already
> brainstormed rather than cause regressions.

So if I understand correctly, you broke QEMU 3 years ago (again only
noticed Jan 2018 because it requires >2 nodes and it requires a VM
larger than 1 NODE which isn't common).

QEMU before your regression in April 2015:

Finished: 30 GB mapped, 10.163159s elapsed, 2.95GB/s
Finished: 34 GB mapped, 11.806526s elapsed, 2.88GB/s
Finished: 38 GB mapped, 10.369081s elapsed, 3.66GB/s
Finished: 42 GB mapped, 12.357719s elapsed, 3.40GB/s

QEMU after your regression in April 2015:

Finished: 30 GB mapped, 8.821632s elapsed, 3.40GB/s
Finished: 34 GB mapped, 341.979543s elapsed, 0.10GB/s
Finished: 38 GB mapped, 761.933231s elapsed, 0.05GB/s
Finished: 42 GB mapped, 1188.409235s elapsed, 0.04GB/s

And now you ask open source upstream QEMU to start using a special
mempolicy if it doesn't want to keep breaking with your change, so you
don't have to apply a one liner to your unreleased so far proprietary
software that uses MADV_HUGEPAGE and depends on a special
__GFP_THISNODE behavior that broke qemu in the first place?

Thanks,
Andrea