Re: kernel panic on resume from S3 - stumped

From: Tim Hockin
Date: Sun Dec 30 2012 - 20:23:12 EST


On Sun, Dec 30, 2012 at 2:55 PM, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
> On Saturday, December 29, 2012 11:17:11 PM Tim Hockin wrote:
>> Best guess:
>>
>> With 'noapic', I see the "irq 5: nobody cared" message on resume,
>> along with 10000 IRQ5 counts in /proc/interrupts (the devices claiming
>> that IRQ are quiescent).
>>
>> Without 'noapic' that must be triggering something else to go haywire,
>> perhaps the AER logic (though that is all MSI, so probably not). I'm
>> flying blind on those boots.
>>
>> I bet that, if I can recall how to re-enable IRQ5, I'll see it
>> continuously asserting. Chipset or BIOS bug maybe. I don't know if I
>> had AER enabled under Lucid, so that might be the difference.
>>
>> I'll try a vanilla kernel next, maybe hack on AER a bit, to see if I
>> can make it progress.
>
> I wonder what happens if you simply disable AER for starters?
>
> There is the pci=noaer kernel command line switch for that.

That still panics on resume. Damn. I really think it is down to that
interrupt storm at resume. Something somewhere is getting stuck
asserting, and we don't know how to EOI it. PIC vs APIC is just
changing the operating mode.

Now the question is whether I am going to track through Intel errata
(more than I have already) and through chipset docs to figure out what
it could be, or just leave it at noapic.

I've already got one new PCI quirk to code up.

> Thanks,
> Rafael
>
>
>> On Sat, Dec 29, 2012 at 10:19 PM, Tim Hockin <thockin@xxxxxxxxxx> wrote:
>> > Quick update: booting with 'noapic' on the commandline seems to make
>> > it resume successfully.
>> >
>> > The main dmesg diffs, other than the obvious "Skipping IOAPIC probe"
>> > and IRG number diffs) are:
>> >
>> > -nr_irqs_gsi: 40
>> > +nr_irqs_gsi: 16
>> >
>> > -NR_IRQS:16640 nr_irqs:776 16
>> > +NR_IRQS:16640 nr_irqs:368 16
>> >
>> > -system 00:0a: [mem 0xfec00000-0xfec00fff] could not be reserved
>> > +system 00:0a: [mem 0xfec00000-0xfec00fff] has been reserved
>> >
>> > and a new warning about irq 5: nobody cared (try booting with the
>> > "irqpoll" option)
>> >
>> > I'll see if I can sort out further differences, but I thought it was
>> > worth sending this new info along, anyway.
>> >
>> > It did not require 'noapic' on the Lucid (2.6.32?) kernel
>> >
>> >
>> > On Sat, Dec 29, 2012 at 9:34 PM, Tim Hockin <thockin@xxxxxxxxxx> wrote:
>> >> Running a suspend with pm_trace set, I get:
>> >>
>> >> aer 0000:00:03.0:pcie02: hash matches
>> >>
>> >> I don't know what magic might be needed here, though.
>> >>
>> >> I guess next step is to try to build a non-distro kernel.
>> >>
>> >> On Sat, Dec 29, 2012 at 1:57 PM, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> >>> On Saturday, December 29, 2012 12:03:13 PM Tim Hockin wrote:
>> >>>> 4 days ago I had Ubuntu Lucid running on this computer. Suspend and
>> >>>> resume worked flawlessly every time.
>> >>>>
>> >>>> Then I upgraded to Ubuntu Precise.
>> >>>
>> >>> Well, do you use a distro kernel or a kernel.org kernel?
>> >>>
>> >>>> Suspend seems to work, but resume
>> >>>> fails every time. The video never initializes. By the flashing
>> >>>> keyboard lights, I guess it's a kernel panic. It fails from the Live
>> >>>> CD and from a fresh install.
>> >>>>
>> >>>> Here is my debug so far.
>> >>>>
>> >>>> Install all updates (3.2 kernel, nouveau driver)
>> >>>> Reboot
>> >>>> Try suspend = fails
>> >>>>
>> >>>> Install Ubuntu's linux-generic-lts-quantal (3.5 kernel, nouveau driver)
>> >>>> Reboot
>> >>>> Try suspend = fails
>> >>>>
>> >>>> Install nVidia's 304 driver
>> >>>> Reboot
>> >>>> Try suspend = fails
>> >>>>
>> >>>> From within X:
>> >>>> echo core > /sys/power/pm_test
>> >>>> echo mem > /sys/power/state
>> >>>> The system acts like it is going to sleep, and then wakes up a few
>> >>>> seconds later. dmesg shows:
>> >>>>
>> >>>> [ 1230.083404] ------------[ cut here ]------------
>> >>>> [ 1230.083410] WARNING: at
>> >>>> /build/buildd/linux-lts-quantal-3.5.0/kernel/power/suspend_test.c:53
>> >>>> suspend_test_finish+0x86/0x90()
>> >>>> [ 1230.083411] Hardware name: To Be Filled By O.E.M.
>> >>>> [ 1230.083412] Component: resume devices, time: 14424
>> >>>> [ 1230.083412] Modules linked in: snd_emu10k1_synth snd_emux_synth
>> >>>> snd_seq_virmidi snd_seq_midi_emul bnep rfcomm parport_pc ppdev
>> >>>> nvidia(PO) snd_emu10k1 snd_ac97_codec ac97_bus snd_pcm snd_page_alloc
>> >>>> snd_util_mem snd_hwdep snd_seq_midi snd_rawmidi snd_seq_midi_event
>> >>>> snd_seq snd_timer coretemp snd_seq_device kvm_intel kvm snd
>> >>>> ghash_clmulni_intel soundcore aesni_intel btusb cryptd aes_x86_64
>> >>>> bluetooth i7core_edac edac_core microcode mac_hid lpc_ich mxm_wmi
>> >>>> shpchp serio_raw wmi hid_generic lp parport usbhid hid r8169
>> >>>> pata_marvell
>> >>>> [ 1230.083445] Pid: 3329, comm: bash Tainted: P O 3.5.0-21-generic
>> >>>> #32~precise1-Ubuntu
>> >>>> [ 1230.083446] Call Trace:
>> >>>> [ 1230.083448] [<ffffffff81052c9f>] warn_slowpath_common+0x7f/0xc0
>> >>>> [ 1230.083452] [<ffffffff81052d96>] warn_slowpath_fmt+0x46/0x50
>> >>>> [ 1230.083455] [<ffffffff8109b836>] suspend_test_finish+0x86/0x90
>> >>>> [ 1230.083457] [<ffffffff8109b53b>] suspend_devices_and_enter+0x10b/0x200
>> >>>> [ 1230.083460] [<ffffffff8109b701>] enter_state+0xd1/0x100
>> >>>> [ 1230.083463] [<ffffffff8109b74b>] pm_suspend+0x1b/0x60
>> >>>> [ 1230.083465] [<ffffffff8109a7a5>] state_store+0x45/0x70
>> >>>> [ 1230.083467] [<ffffffff81331d2f>] kobj_attr_store+0xf/0x30
>> >>>> [ 1230.083471] [<ffffffff811f77ff>] sysfs_write_file+0xef/0x170
>> >>>> [ 1230.083476] [<ffffffff811879d3>] vfs_write+0xb3/0x180
>> >>>> [ 1230.083480] [<ffffffff81187cfa>] sys_write+0x4a/0x90
>> >>>> [ 1230.083483] [<ffffffff816a6e69>] system_call_fastpath+0x16/0x1b
>> >>>> [ 1230.083488] ---[ end trace 839cdd0078b3ce03 ]---
>> >>>>
>> >>>> Boot with init=/bin/bash
>> >>>> unload all modules except USBHID
>> >>>> echo core > /sys/power/pm_test
>> >>>> echo mem > /sys/power/state
>> >>>> system acts like it is going to sleep, and then wakes up a few seconds later
>> >>>> echo none > /sys/power/pm_test
>> >>>> echo mem > /sys/power/state
>> >>>> system goes to sleep
>> >>>> press power to resume = fails
>> >>>>
>> >>>> At this point I am stumped on how to debug. This is a "modern"
>> >>>> computer with no serial ports. It worked under Lucid, so I know it is
>> >>>> POSSIBLE.
>> >>>>
>> >>>> Mobo: ASRock X58 single-socket
>> >>>> CPU: Westmere 6 core (12 hyperthreads) 3.2 GHz
>> >>>> RAM: 12 GB ECC
>> >>>> Disk: sda = Intel SSD, mounted on /
>> >>>> Disk: sdb = Intel SSD, not mounted
>> >>>> Disk: sdc = Seagate HDD, not mounted
>> >>>> Disk: sdd = Seagate HDD, not mounted
>> >>>> NIC = Onboard RTL8168e/8111e
>> >>>> Sound = EMU1212 (emu10k1, not even configured yet)
>> >>>> Video = nVidia GeForce 7600 GT
>> >>>> KB = PS2 (also tried USB)
>> >>>> Mouse = USB
>> >>>>
>> >>>> I have not updated to a more current kernel than 3.5, but I will if
>> >>>> there's evidence that this is resolved. Any other clever trick to
>> >>>> try?
>> >>>
>> >>> There is no evidence and there won't be if you don't try a newer kernel.
>> >>>
>> >>> Thanks,
>> >>> Rafael
>> >>>
>> >>>
>> >>> --
>> >>> I speak only for myself.
>> >>> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/