Re: [PATCH] arm/arm64: KVM: send SIGBUS error to qemu

From: gengdongjiu
Date: Fri Mar 24 2017 - 05:45:37 EST


Hi James,
thanks for your review.

On 2017/3/23 23:06, James Morse wrote:
> Hi Dongjiu Geng,
>
> On 23/03/17 13:01, Dongjiu Geng wrote:
>> when the pfn is KVM_PFN_ERR_HWPOISON, it indicates to send
>> SIGBUS signal from KVM's fault-handling code to qemu, qemu
>> can handle this signal according to the fault address.
>
> I'm afraid I beat you to it on this one:
> https://www.spinics.net/lists/arm-kernel/msg568919.html
>
> (Are you the same gengdj who ask me to post that patch?:
> https://lkml.org/lkml/2017/3/5/187 )

Oh, yes, it is me. recently I do not check my gmail and think you are not reply mail.
it is great that you upstream this patch.

>
> We don't need upstream KVM to do this until either arm or arm64 has
> ARCH_SUPPORTS_MEMORY_FAILURE. Punit and Tyler have discovered problems with the
> way arm64's hugepage and hwpoison interact:
> https://www.spinics.net/lists/arm-kernel/msg568995.html
ok, thanks James. do you know when the arm or arm64 will have ARCH_SUPPORTS_MEMORY_FAILURE?
>
>
> Some comments on the differences:
>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616fd4ddd..1307ec400de3 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -1237,6 +1237,20 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>> __coherent_cache_guest_page(vcpu, pfn, size);
>> }
>>
>> +static void kvm_send_hwpoison_signal(unsigned long address,
>> + struct task_struct *tsk)
>> +{
>> + siginfo_t info;
>> +
>> + info.si_signo = SIGBUS;
>> + info.si_errno = 0;
>> + info.si_code = BUS_MCEERR_AR;
>> + info.si_addr = (void __user *)address;
>> + info.si_addr_lsb = PAGE_SHIFT;
>
> Any version of this patch should handle hugepage for the sizes KVM uses in its
> stage2 mappings. By just passing PAGE_SHIFT you let the guest fault for each
> page that makes up the hugepage.
>
>
>> +
>> + send_sig_info(SIGBUS, &info, tsk);
>> +}
>> +
>> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> struct kvm_memory_slot *memslot, unsigned long hva,
>> unsigned long fault_status)
>> @@ -1309,6 +1323,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> if (is_error_noslot_pfn(pfn))
>> return -EFAULT;
>>
>> + if (is_error_hwpoison_pfn(pfn)) {
>> + kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn),
>> + current);
>> + return -EFAULT;
>
> This will return -EFAULT from the KVM_RUN ioctl(). Is Qemu expected to know it
> should try again? This is indistinguishable from the is_error_noslot_pfn() error
> above.
>
> x86 returns 0 from this path, kvm_handle_bad_page() in arch/x86/kvm/mmu.c as the
> SIGBUS should arrive first. If the SIGBUS is handled the error has been resolved
> and Qemu can call KVM_RUN again. Returning an error and sending SIGBUS suggests
> there are two problems.
thanks for that. I think your Statement is reasonable.
>
>
>> + }
>> +
>> if (kvm_is_device_pfn(pfn)) {
>> mem_type = PAGE_S2_DEVICE;
>> flags |= KVM_S2PTE_FLAG_IS_IOMAP;
>
>
>
> Thanks,
>
> James
>
>
> .
>