transparent huge pages breaks KVM on AMD.

From: Marc Haber
Date: Thu May 12 2016 - 16:20:29 EST


Hi David,

On Sat, Apr 23, 2016 at 07:52:46PM +0100, Dr. David Alan Gilbert wrote:
> Hmm, your problem does sound like bad hardware, but....
> If you've got a nice reliable crash, can you try turning transparent huge pages
> off on the host;
> echo never > /sys/kernel/mm/transparent_hugepage/enabled

I must have missed this hint in the middle of the "your hardware is
bad" avalance that came over me.

I spent two weeks bisecting "good" kernels since during the repeated
reconfigurations, transparent huge pages got turned off in kernel
configuration. After running each kernel for 24 hours, I eventually
ended up with a working 4.5 kernel. The configuration diff was short,
showing transparent huge pages, and - finally - upon re-reading the
thread I found your hint.

I have now the result that 4.5, 4.5.1 and 4.5.4 corrupt KVM guest
memory reliably in the first hour of running under disk load, causing
the VM to either drop dead in the water, or to read randomness from
disk. Rebooting fixes the VM. This happens as soon as transparent huge
pages are turned on in the host.

Turning off transparent huge pages by echo never >
/sys/kernel/mm/transparent_hugepage/enabled fixes the issue even
without rebooting the host. Start up the VM again and it works just
fine.

Is this an issue in (a) transparent huge pages, (b) KVM or (c) qemu?
Where should this issue be forwarded? Or do we just accept it and turn
transparent huge pages off?

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421