Re: kernel bug in kvm_intel

From: Andrew Theurer
Date: Wed Nov 25 2009 - 20:35:42 EST


Tejun Heo wrote:
Hello,

11/01/2009 08:31 PM, Avi Kivity wrote:
Here is the code in question:

3ae7: 75 05 jne 3aee<vmx_vcpu_run+0x26a>
3ae9: 0f 01 c2 vmlaunch
3aec: eb 03 jmp 3af1<vmx_vcpu_run+0x26d>
3aee: 0f 01 c3 vmresume
3af1: 48 87 0c 24 xchg %rcx,(%rsp)
^^^ fault, but not at (%rsp)
Can you please post the full oops (including kernel debug messages
during boot) or give me a pointer to the original message?
http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg23458.html

Also, does
the faulting address coincide with any symbol?
No (at least, not in System.map).

Has there been any progress? Is kvm + oprofile still broken?


I just tried testing tip of kvm.git, but unfortunately I think I might be hitting a different problem, where processes run 100% in kernel mode. In my case, cpus 9 and 13 were stuck, running qemu processes. A stack backtrace for both cpus are below. FWIW, kernel.org 2.6.32-rc7 does not have this problem, or the original problem.


NMI backtrace for cpu 9
CPU 9:
Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput sr_mod cdrom ata_generic pata_acpi ata_piix joydev libata ide_pci_generic usbhid ide_core hid serio_raw cdc_ether usbnet mii matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt i2c_i801 i2c_core pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core bnx2 rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: processor]
Pid: 5687, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1 -[7947AC1]-
RIP: 0010:[<ffffffff810b802b>] [<ffffffff810b802b>] fire_user_return_notifiers+0x31/0x36
RSP: 0018:ffff88095024df08 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000800 RCX: ffff88095024c000
RDX: ffff880028340000 RSI: 0000000000000000 RDI: ffff88095024df58
RBP: ffff88095024df18 R08: 0000000000000000 R09: 0000000000000001
R10: 000000caf1fff62d R11: ffff8805b584de40 R12: 00007fffae48e0f0
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f45c69d57c0(0000) GS:ffff880028340000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: fffff9800121056e CR3: 0000000953d36000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<#DB[1]> <<EOE>> Pid: 5687, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1
Call Trace:
<NMI> [<ffffffff8100af53>] ? show_regs+0x44/0x49
[<ffffffff812e57b2>] nmi_watchdog_tick+0xc2/0x1b9
[<ffffffff812e4e73>] do_nmi+0xb0/0x252
[<ffffffff812e48a0>] nmi+0x20/0x30
[<ffffffff810b802b>] ? fire_user_return_notifiers+0x31/0x36
<<EOE>> [<ffffffff8100b844>] do_notify_resume+0x62/0x69
[<ffffffff8100bf48>] ? int_check_syscall_exit_work+0x9/0x3d
[<ffffffff8100bf8e>] int_signal+0x12/0x17

NMI backtrace for cpu 13
CPU 13:
Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput sr_mod cdrom ata_generic pata_acpi ata_piix joydev libata ide_pci_generic usbhid ide_core hid serio_raw cdc_ether usbnet mii matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt i2c_i801 i2c_core pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core bnx2 rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: processor]
Pid: 5792, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1 -[7947AC1]-
RIP: 0010:[<ffffffff8100bfb0>] [<ffffffff8100bfb0>] int_restore_rest+0x1d/0x3d
RSP: 0018:ffff88124f491f58 EFLAGS: 00000292
RAX: 0000000000000800 RBX: 00007fff9df852e0 RCX: ffff88124f490000
RDX: ffff88099ff40000 RSI: 0000000000000000 RDI: 000000000000fe2e
RBP: 00007fff9df85260 R08: ffff88124f490000 R09: 0000000000000000
R10: 0000000000000005 R11: ffff880954971da0 R12: 00007fff9df851e0
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f73b5b1d7c0(0000) GS:ffff88099ff40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f8d5a8de9d0 CR3: 0000000eb34d7000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<#DB[1]> <<EOE>> Pid: 5792, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1
Call Trace:
<NMI> [<ffffffff8100af53>] ? show_regs+0x44/0x49
[<ffffffff812e57b2>] nmi_watchdog_tick+0xc2/0x1b9
[<ffffffff812e4e73>] do_nmi+0xb0/0x252
[<ffffffff812e48a0>] nmi+0x20/0x30
[<ffffffff8100bfb0>] ? int_restore_rest+0x1d/0x3d
<<EOE>>


-Andrew


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/