Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good3.0.9)

From: Michael Breuer
Date: Fri Jan 20 2012 - 09:24:21 EST


On 1/16/2012 11:39 AM, Michael Breuer wrote:
Synopsis:

Receiving DMAR and other errors after approximately three days of uptime. The symptoms exactly match errors seen and then fixed around 2.6.32.4.

While the system remains unaffected for too long to do a bisect, I was able to confirm that the problem exists in the 3.1 stable branch (I jumped from 3.0 to 3.2 when 3.2. was released).

For now I reverted to the sky2.c from 3.0.9 and am running the rest of the kernel from 3.1.2, but won't be certain that this works until later in the week.

Note that 20 seconds prior to the log extract below were DHCP renewal attempts on eth1, the issue below was on eth0. Not sure it's relevant, however back in 2010 a preceding DHCP event did turn out to be relevant to the manifestation of the bug.

The 3.2.1-dirty I'm running is from git with a single local patch - for sidewinder force-feedback support (shouldn't be relevant to the sky2 issue).

Log extract:

Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault status reg 2
Jan 16 05:49:46 mail kernel: [198230.628925] sky2 0000:06:00.0: error interrupt status=0x80000000
Jan 16 05:49:46 mail kernel: [198230.628929] DMAR:[DMA Read] Request device [06:00.0] fault addr fff78000
Jan 16 05:49:46 mail kernel: [198230.628931] DMAR:[fault reason 06] PTE Read access is not set
Jan 16 05:49:46 mail kernel: [198230.628939] sky2 0000:06:00.0: PCI hardware error (0x2010)
Jan 16 05:49:53 mail dhclient[1616]: DHCPREQUEST on eth1 to 10.240.184.29 port 67
Jan 16 05:50:01 mail kernel: [198246.288400] ------------[ cut here ]------------
Jan 16 05:50:01 mail kernel: [198246.288408] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x247/0x250()
Jan 16 05:50:01 mail kernel: [198246.288411] Hardware name: System Product Name
Jan 16 05:50:01 mail kernel: [198246.288413] NETDEV WATCHDOG: eth0 (sky2): transmit queue 0 timed out
Jan 16 05:50:01 mail kernel: [198246.288415] Modules linked in: tcp_lp cpufreq_stats ebtable_nat ebtables nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6table_filter ip6_tables iptable_mangle ipt_MASQUERADE iptable_nat nf_nat iptable_raw tun bridge stp llc lockd sit tunnel4 ipt_LOG nf_conntrack_ftp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CHECKSUM xt_multiport xt_DSCP w83627ehf xt_mark xt_dscp hwmon_vid binfmt_misc raid1 btrfs sunrpc zlib_deflate libcrc32c snd_hda_codec_analog snd_ens1371 gameport snd_hda_intel snd_rawmidi snd_ac97_codec snd_hda_codec snd_hwdep ac97_bus snd_seq snd_seq_device snd_pcm gspca_spca505 snd_timer gspca_main snd videodev media soundcore i2c_i801 iTCO_wdt microcode v4l2_compat_ioctl32 snd_page_alloc i7core_edac sky2 edac_core pcspkr iTCO_vendor_support virtio_net virtio virtio_ring kvm_intel kvm uinput ipv6 raid456 async_raid6_recov async_pq raid6_pq async_xor firewire_ohci firewire_core pata_acpi ata_generic xor async_memcpy async_tx crc_itu_t pata_marvell nouveau ttm d
Jan 16 05:50:01 mail kernel: rm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi video [last unloaded: nf_conntrack_broadcast]
Jan 16 05:50:01 mail kernel: [198246.288487] Pid: 0, comm: swapper/0 Tainted: G W 3.2.1-dirty #1
Jan 16 05:50:01 mail kernel: [198246.288489] Call Trace:
Jan 16 05:50:01 mail kernel: [198246.288491] <IRQ> [<ffffffff81050a4f>] warn_slowpath_common+0x7f/0xc0
Jan 16 05:50:01 mail kernel: [198246.288501] [<ffffffff8101f0bd>] ? lapic_next_event+0x1d/0x30
Jan 16 05:50:01 mail kernel: [198246.288504] [<ffffffff81050b46>] warn_slowpath_fmt+0x46/0x50
Jan 16 05:50:01 mail kernel: [198246.288509] [<ffffffff81009319>] ? read_tsc+0x9/0x20
Jan 16 05:50:01 mail kernel: [198246.288513] [<ffffffff814a81e7>] dev_watchdog+0x247/0x250
Jan 16 05:50:01 mail kernel: [198246.288518] [<ffffffff8105fbbb>] run_timer_softirq+0x12b/0x3b0
Jan 16 05:50:01 mail kernel: [198246.288521] [<ffffffff814a7fa0>] ? qdisc_reset+0x50/0x50
Jan 16 05:50:01 mail kernel: [198246.288525] [<ffffffff81057d18>] __do_softirq+0xa8/0x210
Jan 16 05:50:01 mail kernel: [198246.288529] [<ffffffff8157496c>] call_softirq+0x1c/0x30
Jan 16 05:50:01 mail kernel: [198246.288533] [<ffffffff810041e5>] do_softirq+0x65/0xa0
Jan 16 05:50:01 mail kernel: [198246.288536] [<ffffffff810580fe>] irq_exit+0x8e/0xb0
Jan 16 05:50:01 mail kernel: [198246.288539] [<ffffffff815750a3>] do_IRQ+0x63/0xe0
Jan 16 05:50:01 mail kernel: [198246.288543] [<ffffffff8156ad2e>] common_interrupt+0x6e/0x6e
Jan 16 05:50:01 mail kernel: [198246.288545] <EOI> [<ffffffff81307b6d>] ? intel_idle+0xed/0x150
Jan 16 05:50:01 mail kernel: [198246.288551] [<ffffffff81307b4f>] ? intel_idle+0xcf/0x150
Jan 16 05:50:01 mail kernel: [198246.288555] [<ffffffff8144d331>] cpuidle_idle_call+0xc1/0x280
Jan 16 05:50:01 mail kernel: [198246.288559] [<ffffffff8100122a>] cpu_idle+0xca/0x120
Jan 16 05:50:01 mail kernel: [198246.288563] [<ffffffff8154741e>] rest_init+0x72/0x74
Jan 16 05:50:01 mail kernel: [198246.288568] [<ffffffff81b6abdd>] start_kernel+0x3b5/0x3c0
Jan 16 05:50:01 mail kernel: [198246.288572] [<ffffffff81b6a32b>] x86_64_start_reservations+0x132/0x136
Jan 16 05:50:01 mail kernel: [198246.288576] [<ffffffff81b6a140>] ? early_idt_handlers+0x140/0x140
Jan 16 05:50:01 mail kernel: [198246.288580] [<ffffffff81b6a431>] x86_64_start_kernel+0x102/0x111
Jan 16 05:50:01 mail kernel: [198246.288583] ---[ end trace bb26011d21a2b1d7 ]---
Jan 16 05:50:01 mail kernel: [198246.288586] sky2 0000:06:00.0: eth0: tx timeout
Jan 16 05:50:01 mail kernel: [198246.288593] sky2 0000:06:00.0: eth0: transmit ring 115 .. 10 report=115 done=115



FYI - I've been up for four days now without issues running on 3.2.1 + sky2.c from 3.0.9. Looks like the issue is in fact in one of the modifications made in sky2.c between those two releases.

--
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/