Re: __schedule #DF splat

From: Jan Kiszka
Date: Sun Jun 29 2014 - 06:32:48 EST


On 2014-06-29 12:24, Gleb Natapov wrote:
> On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
>> On 2014-06-29 08:46, Gleb Natapov wrote:
>>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
>>>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a
>>>>
>>>> kvm injects the #PF into the guest.
>>>>
>>>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1
>>>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
>>>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0)
>>>>
>>>> Second #PF at the same address and kvm injects the #DF.
>>>>
>>>> BUT(!), why?
>>>>
>>>> I probably am missing something but WTH are we pagefaulting at a
>>>> user address in context_switch() while doing a lockdep call, i.e.
>>>> spin_release? We're not touching any userspace gunk there AFAICT.
>>>>
>>>> Is this an async pagefault or so which kvm is doing so that the guest
>>>> rip is actually pointing at the wrong place?
>>>>
>>> There is nothing in the trace that point to async pagefault as far as I see.
>>>
>>>> Or something else I'm missing, most probably...
>>>>
>>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
>>> kvm_multiple_exception() to see which two exception are combined into #DF.
>>>
>>
>> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
>> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
>> when patch-disabling the vmport in QEMU.
>>
>> Let me know if I can help with the analysis.
>>
> Bisection would be great of course. Once thing that is special about
> vmport that comes to mind is that it reads vcpu registers to userspace and
> write them back. IIRC "info registers" does the same. Can you see if the
> problem is reproducible with disabled vmport, but doing "info registers"
> in qemu console? Although trace does not should any exists to userspace
> near the failure...

Yes, info registers crashes the guest after a while as well (with
different backtrace due to different context).

Jan


Attachment: signature.asc
Description: OpenPGP digital signature