2.6.25.x crashes, scheduling problems and access beyond end of device...

From: Matthias Dahl
Date: Mon May 12 2008 - 09:13:20 EST


Hello all.

I have recently updated from 2.6.24.4 to 2.6.25 and did a bios update at the
same time because of some problems I was experiencing. (I know bad practice
but I was short on time -g-) Ever since I was experiencing stability and
scheduling problems. Naturally I blamed the bios update first and tried
everything when I realized it was totally fine with 2.6.24.{4,7}. Since a lot
of fixes went into 2.6.25.{1,2,3} I updated to 2.6.25.3 and things did not
improve- at all.

When the last crash happened, the following made it to syslog before I got
another one a few seconds later which totally froze my machine.

May 11 15:54:58 dreamgate general protection fault: 0000 [1] PREEMPT SMP
May 11 15:54:58 dreamgate CPU 1
May 11 15:54:58 dreamgate Modules linked in: iptable_raw xt_comment xt_policy
ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP
ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_ah ipt_addrtype nf_nat_tftp
nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc
nf_nat_h323 nf_nat_ftp nf_conntrack_tftp nf_conntrack_sip
nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre
nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc
nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner
xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length
xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack
xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat
nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter
ip_tables x_tables it87 tun w83791d hwmon_vid fuse snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ext2
tda10023 nvidia(P) budget_av saa7146_vv videobuf_dma_sg videobuf_core
videodev v4l1_compat firmware_class snd_hda_intel budget_core dvb_core
snd_pcm saa7146 forcedeth snd_timer k8temp snd_page_alloc ttpci_eeprom
snd_hwdep snd soundcore i2c_nforce2
May 11 15:54:58 dreamgate Pid: 18037, comm: mc Tainted: P 2.6.25.3 #1
May 11 15:54:58 dreamgate RIP: 0010:[<ffffffff80266efd>] [<ffffffff80266efd>]
find_get_page+0x3d/0x80
May 11 15:54:58 dreamgate RSP: 0018:ffff810076ca7cc8 EFLAGS: 00010006
May 11 15:54:58 dreamgate RAX: 0200000000000000 RBX: 0200000000000000 RCX:
ffffffffffffffff
May 11 15:54:58 dreamgate RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff810055a541b8
May 11 15:54:58 dreamgate RBP: ffff810076ca7ce8 R08: 0000000000000001 R09:
0000000000000080
May 11 15:54:58 dreamgate R10: 0000000000000008 R11: 0000000000000001 R12:
0000000000000000
May 11 15:54:58 dreamgate R13: ffff810055a541c8 R14: 0000000000000001 R15:
0000000000000000
May 11 15:54:58 dreamgate FS: 00007fc3c327e710(0000) GS:ffff81007f80e380
(0000) knlGS:00000000f7c476c0
May 11 15:54:58 dreamgate CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 11 15:54:58 dreamgate CR2: 00000000006b9c34 CR3: 0000000076cdb000 CR4:
00000000000006e0
May 11 15:54:58 dreamgate DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
May 11 15:54:58 dreamgate DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
May 11 15:54:58 dreamgate Process mc (pid: 18037, threadinfo ffff810076ca6000,
task ffff810007940000)
May 11 15:54:58 dreamgate Stack: ffff810076ca7d08 ffff810076ca7ed8
0000000000000000 ffff810055a541b0
May 11 15:54:58 dreamgate ffff810076ca7dd8 ffffffff802690bc ffff81002ff2fba8
ffff810076ca7e68
May 11 15:54:58 dreamgate ffffffff805f4140 ffffffff805f4170 ffff810076ca7de8
ffff81002ff2fb40
May 11 15:54:58 dreamgate Call Trace:
May 11 15:54:58 dreamgate [<ffffffff802690bc>]
generic_file_aio_read+0x22c/0x620
May 11 15:54:58 dreamgate [<ffffffff80290311>] do_sync_read+0xf1/0x130
May 11 15:54:58 dreamgate [<ffffffff8024c6b6>] ? remove_wait_queue+0x46/0x60
May 11 15:54:58 dreamgate [<ffffffff8022cd3e>] ? __wake_up+0x4e/0x70
May 11 15:54:58 dreamgate [<ffffffff8024c480>] ?
autoremove_wake_function+0x0/0x40
May 11 15:54:58 dreamgate [<ffffffff80362651>] ? tty_ldisc_deref+0x61/0x80
May 11 15:54:58 dreamgate [<ffffffff80364eee>] ? tty_write+0x22e/0x240
May 11 15:54:58 dreamgate [<ffffffff80241cdd>] ? do_sigaction+0xfd/0x190
May 11 15:54:58 dreamgate [<ffffffff80290bd4>] vfs_read+0xc4/0x160
May 11 15:54:58 dreamgate [<ffffffff80291070>] sys_read+0x50/0x90
May 11 15:54:58 dreamgate [<ffffffff8020b5eb>]
system_call_after_swapgs+0x7b/0x80
May 11 15:54:58 dreamgate
May 11 15:54:58 dreamgate
May 11 15:54:58 dreamgate Code: 5d e8 48 89 fb 4c 89 65 f0 49 89 f4 48 83 c3
08 4c 89 ef e8 c6 b3 24 00 48 89 df 4c 89 e6 e8 6b 7d 0b 00 48 85 c0 48 89 c3
74 17 <48> 8b 00 48 89 da 25 00 40 02 00 48 3d 00 40 02 00 74 1d f0 ff
May 11 15:54:58 dreamgate RIP [<ffffffff80266efd>] find_get_page+0x3d/0x80
May 11 15:54:58 dreamgate RSP <ffff810076ca7cc8>
May 11 15:54:58 dreamgate ---[ end trace 79f0890914a5e79c ]---
May 11 15:54:58 dreamgate note: mc[18037] exited with preempt_count 1
May 11 15:54:58 dreamgate BUG: scheduling while atomic: mc/18037/0x00000002
May 11 15:54:58 dreamgate Pid: 18037, comm: mc Tainted: P D 2.6.25.3 #1
May 11 15:54:58 dreamgate
May 11 15:54:58 dreamgate Call Trace:
May 11 15:54:58 dreamgate [<ffffffff80230105>] __schedule_bug+0x65/0x70
May 11 15:54:58 dreamgate [<ffffffff804b0256>] thread_return+0x2eb/0x535
May 11 15:54:58 dreamgate [<ffffffff804b1faf>] __down+0x9f/0x112
May 11 15:54:58 dreamgate [<ffffffff8022d810>] ?
default_wake_function+0x0/0x10
May 11 15:54:58 dreamgate [<ffffffff8026fa9b>] ? release_pages+0x18b/0x1d0
May 11 15:54:58 dreamgate [<ffffffff804b1c4b>] __down_failed+0x35/0x3a
May 11 15:54:58 dreamgate [<ffffffff804b2769>] ? lock_kernel+0x29/0x30
May 11 15:54:58 dreamgate [<ffffffff803659a1>] tty_release+0x11/0x30
May 11 15:54:58 dreamgate [<ffffffff802915e3>] __fput+0xb3/0x1a0
May 11 15:54:58 dreamgate [<ffffffff802918e6>] fput+0x16/0x20
May 11 15:54:58 dreamgate [<ffffffff8028e4b6>] filp_close+0x56/0x90
May 11 15:54:58 dreamgate [<ffffffff80239171>] put_files_struct+0xc1/0xd0
May 11 15:54:58 dreamgate [<ffffffff802391ce>] __exit_files+0x4e/0x60
May 11 15:54:58 dreamgate [<ffffffff8023a6f7>] do_exit+0x1c7/0x810
May 11 15:54:58 dreamgate [<ffffffff8022cd3e>] ? __wake_up+0x4e/0x70
May 11 15:54:58 dreamgate [<ffffffff8020c8e5>] oops_end+0x85/0x90
May 11 15:54:58 dreamgate [<ffffffff8020d67e>] die+0x5e/0x90
May 11 15:54:58 dreamgate [<ffffffff8020e308>]
do_general_protection+0x148/0x150
May 11 15:54:58 dreamgate [<ffffffff804b2869>] error_exit+0x0/0x51
May 11 15:54:58 dreamgate [<ffffffff80266efd>] ? find_get_page+0x3d/0x80
May 11 15:54:58 dreamgate [<ffffffff80266ef5>] ? find_get_page+0x35/0x80
May 11 15:54:58 dreamgate [<ffffffff802690bc>] ?
generic_file_aio_read+0x22c/0x620
May 11 15:54:58 dreamgate [<ffffffff80290311>] ? do_sync_read+0xf1/0x130
May 11 15:54:58 dreamgate [<ffffffff8024c6b6>] ? remove_wait_queue+0x46/0x60
May 11 15:54:58 dreamgate [<ffffffff8022cd3e>] ? __wake_up+0x4e/0x70
May 11 15:54:58 dreamgate [<ffffffff8024c480>] ?
autoremove_wake_function+0x0/0x40
May 11 15:54:58 dreamgate [<ffffffff80362651>] ? tty_ldisc_deref+0x61/0x80
May 11 15:54:58 dreamgate [<ffffffff80364eee>] ? tty_write+0x22e/0x240
May 11 15:54:58 dreamgate [<ffffffff80241cdd>] ? do_sigaction+0xfd/0x190
May 11 15:54:58 dreamgate [<ffffffff80290bd4>] ? vfs_read+0xc4/0x160
May 11 15:54:58 dreamgate [<ffffffff80291070>] ? sys_read+0x50/0x90
May 11 15:54:58 dreamgate [<ffffffff8020b5eb>] ?
system_call_after_swapgs+0x7b/0x80

The second trace I got a few seconds later froze the machine hard, so I was
unable to login remotely and get it saved besides that it was too long to fit
on 25 lines. :-( The crashes happen more or less randomly at least at first
sight. One thing is for sure though, the machine has to be under load for
such a crash to happen. I managed to crash the machine yesterday by starting
some big compilation jobs, meanwhile having two burnK7 stress the cpu even
more and have some audio playback streamed from the net. I never had those
crashes with .24.x before nor was I able to provoke one yesterday doing the
same things. (the same cpu sane temperatures and ram tested 12+ hours with
memtest)

That brings me to the next issue: scheduling seems to be much worse with .25.x
because I can easily produce audio skips and longer latency in X. Having some
medium sized background io going on, is _very_ noticable with .25.x whereas I
do not notice a thing with .24.x. :( The system generally feels somewhat
sluggish whereas .24.x is just the way it was meant to be.(tm) The stress
above mentioned makes the system almost unusable with .25.x where as .24.x
feels loaded but still works ok even the audio playback.

Last but not least, the following problem I was unable to reproduce but it
never happened before and a full fs check revealed nothing.

May 10 15:09:45 dreamgate attempt to access beyond end of device
May 10 15:09:45 dreamgate md5: rw=1, want=6057341499062899736, limit=972671360
May 10 15:09:45 dreamgate Buffer I/O error on device md5, logical block
31197826
May 10 15:09:45 dreamgate lost page write due to I/O error on md5

This was with kernel .25.3 only.

I have attached both the .24.7 and 25.3 config files as well as their
respective kernel output and some more(commented) informations that might be
useful.

My system is an Asus Crosshair (AM2) with an AMD Athlon64 X2 5600+ (Windsor)
running an up2date gentoo x86_64 system with glibc 2.7 and gcc 4.2.3.

If you need any more informations or if there is any way I can help nail this
bug, please let me know. I'd have done a git bisect but I lack the time to
work myself through the _huge_ list of commits with all the recompiling and
stress testing. Sorry. :-(

Thanks in advance...!

Best regards,
Matthias Dahl

PS. Please cc' me when replying because I am not subscribed to the kernel
mailing list.

Attachment: dmesg-25.3.gz
Description: GNU Zip compressed data

Attachment: info-25.3.gz
Description: GNU Zip compressed data

Attachment: info-24.7.gz
Description: GNU Zip compressed data

Attachment: dmesg.24.7.gz
Description: GNU Zip compressed data

Attachment: k24.7.config.gz
Description: GNU Zip compressed data

Attachment: k25.3.config.gz
Description: GNU Zip compressed data