Re: kernel bug in kvm_intel

From: Avi Kivity
Date: Sun Nov 01 2009 - 05:21:46 EST


On 11/01/2009 12:00 PM, Tejun Heo wrote:
Hello,

Avi Kivity wrote:
Only, that merge doesn't change virt/kvm or arch/x86/kvm.

Tejun, anything known bad about that merge? ada3fa15 kills kvm.
Nothing rings a bell at the moment. How does it kill kvm? One big
difference caused by that merge is use of sparse areas near the top of
vmalloc area. This caused vmalloc area shortage on sparc64 and
exposed paging code bug on ppc64 which caused the cpu to fault
repeatedly on the same address. Maybe something similiar is happening
with kvm?


We get a page fault immediately (next instruction) after returning from the guest when running with oprofile. The page fault address does not match anything the instruction does, so presumably it is one of the accesses the processor performs in order to service an NMI (ordinary interrupts are masked; and the fact that it happens with oprofile strengthens this assumption).

If this is correct, the fault is not in the NMI handler itself, but in one of the memory areas the cpu looks in to vector the NMI, which can be:

- the IDT
- the GDT
- the TSS
- the NMI stack

Except for the IDT these are per-cpu structure, though I don't know whether they are allocated with the percpu infrastructure.

Here is the code in question:

3ae7: 75 05 jne 3aee<vmx_vcpu_run+0x26a>
3ae9: 0f 01 c2 vmlaunch
3aec: eb 03 jmp 3af1<vmx_vcpu_run+0x26d>
3aee: 0f 01 c3 vmresume
3af1: 48 87 0c 24 xchg %rcx,(%rsp)

^^^ fault, but not at (%rsp)

3af5: 48 89 81 18 01 00 00 mov %rax,0x118(%rcx)
3afc: 48 89 99 30 01 00 00 mov %rbx,0x130(%rcx)




--

error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/