Re: VM boot failure on nodes not having DMA32 zone

From: Liang C
Date: Wed Jul 25 2018 - 04:37:15 EST


Thank you very much for the quick reply and confirmation! I just made
and submitted a patch according to your advice.

Thanks,
Liang

On Wed, Jul 25, 2018 at 2:05 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> On 24/07/2018 09:53, Liang C wrote:
>> Hi,
>>
>> We have a situation where our qemu processes need to be launched under
>> cgroup cpuset.mems control. This introduces an similar issue that was
>> discussed a few years ago. The difference here is that for our case,
>> not being able to allocate from DMA32 zone is a result a cgroup
>> restriction not mempolicy enforcement. Here is the steps to reproduce
>> the failure,
>>
>> mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone)
>> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems
>> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus
>> echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall
>> echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks
>>
>> #launch a virtual machine
>> kvm_init_vcpu failed: Cannot allocate memory
>>
>> There are workarounds, like always putting qemu processes onto the
>> node with DMA32 zone or not restricting qemu processes memory
>> allocation until that DMA32 alloc finishes (difficult to be precise).
>> But we would like to find a way to address the root cause.
>>
>> Considering the fact that the pae_root shadow should not be needed
>> when ept is in use, which is indeed our case - ept is always available
>> for us (guessing this is the same case for most of other users), we
>> made a patch roughly like this,
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index d594690..1d1b61e 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
>> vcpu->arch.mmu.translate_gpa = translate_gpa;
>> vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
>>
>> - return alloc_mmu_pages(vcpu);
>> + return tdp_enabled ? 0 : alloc_mmu_pages(vcpu);
>> }
>>
>> void kvm_mmu_setup(struct kvm_vcpu *vcpu)
>>
>>
>> It works through our test cases. But we would really like to have your
>> insight on this patch before applying it in production environment and
>> contributing it back to the community. Thanks in advance for any help
>> you may provide!
>
> Yes, this looks good. However, I'd place the "if" in alloc_mmu_pages
> itself.
>
> Thanks,
>
> Paolo