Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

From: Takashi Iwai
Date: Wed Mar 18 2015 - 13:46:50 EST


At Wed, 18 Mar 2015 18:43:52 +0100,
Takashi Iwai wrote:
>
> At Wed, 18 Mar 2015 15:16:42 +0100,
> Takashi Iwai wrote:
> >
> > At Sun, 15 Mar 2015 09:17:15 +0100,
> > Stefan Seyfried wrote:
> > >
> > > Hi all,
> > >
> > > in 4.0-rc I have recently seen a few crashes, always when running
> > > KVM guests (IIRC). Today I was able to capture a crash dump, this
> > > is the backtrace from dmesg.txt:
> > >
> > > [242060.604870] PANIC: double fault, error_code: 0x0
> > > [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1
> > > [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011
> > > [242060.604883] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000
> > > [242060.604885] RIP: 0010:[<ffffffff816834ad>] [<ffffffff816834ad>] page_fault+0xd/0x30
> > > [242060.604893] RSP: 0018:00007fffa55eafb8 EFLAGS: 00010016
> > > [242060.604895] RAX: 000000000000aa40 RBX: 0000000000000001 RCX: ffffffff81682237
> > > [242060.604896] RDX: 000000000000aa40 RSI: 0000000000000000 RDI: 00007fffa55eb078
> > > [242060.604898] RBP: 00007fffa55f1c1c R08: 0000000000000008 R09: 0000000000000000
> > > [242060.604900] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000004a
> > > [242060.604902] R13: 00007ffa356b5d60 R14: 000000000000000f R15: 00007ffa3556cf20
> > > [242060.604904] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
> > > [242060.604906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [242060.604908] CR2: 00007fffa55eafa8 CR3: 0000000002d7e000 CR4: 00000000000427e0
> > > [242060.604909] Stack:
> > > [242060.604942] BUG: unable to handle kernel paging request at 00007fffa55eafb8
> > > [242060.604995] IP: [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190
> > > [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0
> > > [242060.605078] Oops: 0000 [#1] PREEMPT SMP
> > > [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm
> > > [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop
> > > [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1
> > > [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011
> > > [242060.605396] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000
> > > [242060.605396] RIP: 0010:[<ffffffff81005b44>] [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190
> > > [242060.605396] RSP: 0018:ffff88023bc84e88 EFLAGS: 00010046
> > > [242060.605396] RAX: 00007fffa55eafc0 RBX: 00007fffa55eafb8 RCX: ffff88023bc7ffc0
> > > [242060.605396] RDX: 0000000000000000 RSI: ffff88023bc84f58 RDI: 0000000000000000
> > > [242060.605396] RBP: ffff88023bc83fc0 R08: ffffffff81a2fe15 R09: 0000000000000020
> > > [242060.605396] R10: 0000000000000afb R11: ffff88023bc84bee R12: ffff88023bc84f58
> > > [242060.605396] R13: 0000000000000000 R14: ffffffff81a2fe15 R15: 0000000000000000
> > > [242060.605396] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
> > > [242060.605396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [242060.605396] CR2: 00007fffa55eafb8 CR3: 0000000002d7e000 CR4: 00000000000427e0
> > > [242060.605396] Stack:
> > > [242060.605396] 0000000002d7e000 0000000000000008 ffff88023bc84ee8 00007fffa55eafb8
> > > [242060.605396] 0000000000000000 ffff88023bc84f58 00007fffa55eafb8 0000000000000040
> > > [242060.605396] 00007ffa356b5d60 000000000000000f 00007ffa3556cf20 ffffffff81005c36
> > > [242060.605396] Call Trace:
> > > [242060.605396] [<ffffffff81005c36>] show_regs+0x86/0x210
> > > [242060.605396] [<ffffffff8104636f>] df_debug+0x1f/0x30
> > > [242060.605396] [<ffffffff810041a4>] do_double_fault+0x84/0x100
> > > [242060.605396] [<ffffffff81683088>] double_fault+0x28/0x30
> > > [242060.605396] [<ffffffff816834ad>] page_fault+0xd/0x30
> > > [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 89
> > > [242060.605396] RIP [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190
> > > [242060.605396] RSP <ffff88023bc84e88>
> > > [242060.605396] CR2: 00007fffa55eafb8
> > >
> > > I would not totally rule out a hardware problem, since this machine had
> > > another weird crash where it crashed and the bios beeper was constant
> > > on until I hit the power button for 5 seconds.
> > >
> > > Unfortunately, I cannot load the crashdump with the crash version in
> > > openSUSE Tumbleweed, so the backtrace is all I have for now.
> >
> > Just "me too", I'm getting the very same crash out of sudden with the
> > recent 4.0-rc. Judging from the very same pattern (usually crash
> > happened while using KVM (-smp 4) and kernel builds with -j8), I don't
> > think it's a hardware problem.
>
> The git bisection pointed to the commit:
> commit b926e6f61a26036ee9eabe6761483954d481ad25
> x86, traps: Fix ist_enter from userspace
>
> And reverting this on top of the latest Linus tree seems working.
> Seife, could you verify on your machine, too?

Argh, false positive. Right after I wrote this mail, I got the very
same crash. I seem to need running the test much longer than I
thought.

But somehow the commits around the above smell suspicious...


Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/