Re: Linux 2.6.11-rc1

From: Terence Ripperda
Date: Wed Jan 12 2005 - 20:30:34 EST



this looks suspiciously like a kernel bug Andi Kleen and I are
currently investigating. He has fixes in the bk branches. I would have
expected them to be in such a new kernel, but I noticed that you're
using an x86 kernel, and his initial fixes were only in the x86_64
arch files.

change_page_attr has a book-keeping bug that surprisingly hasn't
caused problems until recently (on my todo list is to track down what
caused this problem to suddenly start triggering).

the x86_64 change in bk is here, but the only thing you really need is
the 'get_page' fix. you should be able to manually edit
linux/arch/i386/mm/pageattr.c:__change_page_attr(), update that single
line and be fine:

http://linux.bkbits.net:8080/linux-2.6/diffs/arch/x86_64/mm/pageattr.c@xxxx?nav=index.html|src/|src/arch|src/arch/x86_64|src/arch/x86_64/mm|hist/arch/x86_64/mm/pageattr.c

Thanks,
Terence

On Thu, Jan 13, 2005 at 02:13:28AM +0100, lista1@xxxxxxxxx wrote:
> On Wed, 12 Jan 2005 09:52:38 +0100 Voluspa wrote:
>
> > Yes, tainted. X black screen, no keyboard. Power button to turn off. I
> > really don't feel like compiling lots of debug-kernels to chase this,
> > unless someone is really interested, which I doubt.
>
> A fyi directed to other users: It was the nvidia module. Xorg nv is
> fine. Did a CONFIG_KALLSYMS=y kernel and am attaching the output.
>
> Turning on relevant debugging options in "Kernel hacking" hides the bug
> (ie prevents it from happening). Very annoying and is a behaviour I've
> seen before with other bugs. The price of kernel complexity, I guess.
>
> --
> Mvh
> Mats Johannesson

> [-- mutt.octet.filter file type: "ASCII text, with very long lines" --]
>
>
> Jan 13 01:09:42 loke kernel: ------------[ cut here ]------------
> Jan 13 01:09:42 loke kernel: kernel BUG at <bad filename>:395!
> Jan 13 01:09:42 loke kernel: invalid operand: 0000 [#1]
> Jan 13 01:09:42 loke kernel: PREEMPT
> Jan 13 01:09:42 loke kernel: Modules linked in: nvidia 8139too crc32 snd_seq_oss snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul snd_seq snd_pcm_oss snd_mixer_oss snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore
> Jan 13 01:09:42 loke kernel: CPU: 0
> Jan 13 01:09:42 loke kernel: EIP: 0060:[__change_page_attr+169/286] Tainted: P VLI
> Jan 13 01:09:42 loke kernel: EFLAGS: 00213002 (2.6.11-rc1)
> Jan 13 01:09:42 loke kernel: EIP is at __change_page_attr+0xa9/0x11e
> Jan 13 01:09:42 loke kernel: eax: 054001e3 ebx: 05530000 ecx: c1006c40 edx: 054001e3
> Jan 13 01:09:42 loke kernel: esi: c0362c54 edi: 00000163 ebp: c1000000 esp: ca147db4
> Jan 13 01:09:42 loke kernel: ds: 007b es: 007b ss: 0068
> Jan 13 01:09:42 loke kernel: Process X (pid: 575, threadinfo=ca147000 task=cd4b4060)
> Jan 13 01:09:42 loke kernel: Stack: c5530000 c10aa600 00000010 00000000 00203246 c010db9d 00000163 00000011
> Jan 13 01:09:42 loke kernel: c10aa400 ce2eda60 cd54fc00 ca147e20 c010d883 00010000 d0a80000 d0f75bec
> Jan 13 01:09:42 loke kernel: d0d8a180 d0a80000 00010000 cd54fc00 d0d8a173 cff08400 cd54f400 ca147e60
> Jan 13 01:09:42 loke kernel: Call Trace:
> Jan 13 01:09:42 loke kernel: [change_page_attr+48/97] change_page_attr+0x30/0x61
> Jan 13 01:09:42 loke kernel: [iounmap+101/118] iounmap+0x65/0x76
> Jan 13 01:09:42 loke kernel: [pg0+280927212/1070015488] os_unmap_kernel_space+0x9/0xa [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278913408/1070015488] _nv001706rm+0x20/0x2c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278913395/1070015488] _nv001706rm+0x13/0x2c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278898481/1070015488] _nv002359rm+0xe9/0x184 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278862458/1070015488] _nv001955rm+0x36/0xe0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278935092/1070015488] _nv001297rm+0x9c/0xa8 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278940044/1070015488] rm_teardown_agp+0x48/0x50 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278862458/1070015488] _nv001955rm+0x36/0xe0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+280919192/1070015488] nv_agp_teardown+0x4c/0x7c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278922699/1070015488] _nv001708rm+0x73/0xa0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278865062/1070015488] _nv001847rm+0x26/0x2c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278925180/1070015488] _nv000650rm+0x58/0xcc [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278925271/1070015488] _nv000650rm+0xb3/0xcc [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278926961/1070015488] _nv001362rm+0x71/0xb0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278926974/1070015488] _nv001362rm+0x7e/0xb0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278914238/1070015488] _nv001820rm+0x12/0x18 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278937831/1070015488] rm_disable_adapter+0x2f/0x8c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278937879/1070015488] rm_disable_adapter+0x5f/0x8c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278937867/1070015488] rm_disable_adapter+0x53/0x8c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+280910374/1070015488] nv_kern_close+0x168/0x1e1 [nvidia]
> Jan 13 01:09:42 loke kernel: [__fput+61/228] __fput+0x3d/0xe4
> Jan 13 01:09:42 loke kernel: [filp_close+89/95] filp_close+0x59/0x5f
> Jan 13 01:09:42 loke kernel: [sysenter_past_esp+82/117] sysenter_past_esp+0x52/0x75
> Jan 13 01:09:42 loke kernel: Code: 56 37 c0 c1 f9 05 c1 e1 0c 0b 0d c8 62 2c c0 e8 10 ff ff ff 89 d9 ff 41 04 eb 12 a9 80 00 00 00 75 09 09 fb 89 1e ff 49 04 eb 02 <0f> 0b 8b 01 f6 c4 08 75 64 8b 41 04 40 75 02 0f 0b a1 0c 4e 2c
> Jan 13 01:09:42 loke kernel: <6>note: X[575] exited with preempt_count 1
> Jan 13 01:09:42 loke kernel: scheduling while atomic: X/0x00000001/575
> Jan 13 01:09:42 loke kernel: [schedule+64/1068] schedule+0x40/0x42c
> Jan 13 01:09:42 loke kernel: [autoremove_wake_function+0/45] autoremove_wake_function+0x0/0x2d
> Jan 13 01:09:42 loke kernel: [__sched_text_start+134/237] __down+0x86/0xed
> Jan 13 01:09:42 loke kernel: [default_wake_function+0/12] default_wake_function+0x0/0xc
> Jan 13 01:09:42 loke kernel: [release_mem+446/458] release_mem+0x1be/0x1ca
> Jan 13 01:09:42 loke kernel: [__down_failed+7/12] __down_failed+0x7/0xc
> Jan 13 01:09:42 loke kernel: [pg0+280927615/1070015488] .text.lock.os_interface+0x7/0x18 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278914238/1070015488] _nv001820rm+0x12/0x18 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278939386/1070015488] rm_free_unused_clients+0x2e/0x88 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278939410/1070015488] rm_free_unused_clients+0x46/0x88 [nvidia]
> Jan 13 01:09:42 loke kernel: [dput+27/459] dput+0x1b/0x1cb
> Jan 13 01:09:42 loke kernel: [pg0+280913993/1070015488] nv_kern_ctl_close+0x77/0xa1 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+280910265/1070015488] nv_kern_close+0xfb/0x1e1 [nvidia]
> Jan 13 01:09:42 loke kernel: [destroy_inode+34/49] destroy_inode+0x22/0x31
> Jan 13 01:09:42 loke kernel: [__fput+61/228] __fput+0x3d/0xe4
> Jan 13 01:09:42 loke kernel: [filp_close+89/95] filp_close+0x59/0x5f
> Jan 13 01:09:42 loke kernel: [put_files_struct+86/169] put_files_struct+0x56/0xa9
> Jan 13 01:09:42 loke kernel: [do_exit+257/703] do_exit+0x101/0x2bf
> Jan 13 01:09:42 loke kernel: [do_trap+0/162] do_trap+0x0/0xa2
> Jan 13 01:09:42 loke kernel: [do_invalid_op+0/139] do_invalid_op+0x0/0x8b
> Jan 13 01:09:42 loke kernel: [do_invalid_op+127/139] do_invalid_op+0x7f/0x8b
> Jan 13 01:09:42 loke kernel: [__change_page_attr+169/286] __change_page_attr+0xa9/0x11e
> Jan 13 01:09:42 loke kernel: [kobject_get+15/19] kobject_get+0xf/0x13
> Jan 13 01:09:42 loke kernel: [get_device+14/20] get_device+0xe/0x14
> Jan 13 01:09:42 loke kernel: [pci_dev_get+15/19] pci_dev_get+0xf/0x13
> Jan 13 01:09:42 loke kernel: [pci_get_subsys+174/206] pci_get_subsys+0xae/0xce
> Jan 13 01:09:42 loke kernel: [pci_get_device+11/14] pci_get_device+0xb/0xe
> Jan 13 01:09:42 loke kernel: [pg0+280926374/1070015488] os_pci_init_handle+0x7d/0x86 [nvidia]
> Jan 13 01:09:42 loke kernel: [pci_read+28/33] pci_read+0x1c/0x21
> Jan 13 01:09:42 loke kernel: [error_code+43/48] error_code+0x2b/0x30
> Jan 13 01:09:42 loke kernel: [__change_page_attr+169/286] __change_page_attr+0xa9/0x11e
> Jan 13 01:09:42 loke kernel: [change_page_attr+48/97] change_page_attr+0x30/0x61
> Jan 13 01:09:42 loke kernel: [iounmap+101/118] iounmap+0x65/0x76
> Jan 13 01:09:42 loke kernel: [pg0+280927212/1070015488] os_unmap_kernel_space+0x9/0xa [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278913408/1070015488] _nv001706rm+0x20/0x2c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278913395/1070015488] _nv001706rm+0x13/0x2c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278898481/1070015488] _nv002359rm+0xe9/0x184 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278862458/1070015488] _nv001955rm+0x36/0xe0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278935092/1070015488] _nv001297rm+0x9c/0xa8 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278940044/1070015488] rm_teardown_agp+0x48/0x50 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278862458/1070015488] _nv001955rm+0x36/0xe0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+280919192/1070015488] nv_agp_teardown+0x4c/0x7c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278922699/1070015488] _nv001708rm+0x73/0xa0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278865062/1070015488] _nv001847rm+0x26/0x2c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278925180/1070015488] _nv000650rm+0x58/0xcc [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278925271/1070015488] _nv000650rm+0xb3/0xcc [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278926961/1070015488] _nv001362rm+0x71/0xb0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278926974/1070015488] _nv001362rm+0x7e/0xb0 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278914238/1070015488] _nv001820rm+0x12/0x18 [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278937831/1070015488] rm_disable_adapter+0x2f/0x8c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278937879/1070015488] rm_disable_adapter+0x5f/0x8c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+278937867/1070015488] rm_disable_adapter+0x53/0x8c [nvidia]
> Jan 13 01:09:42 loke kernel: [pg0+280910374/1070015488] nv_kern_close+0x168/0x1e1 [nvidia]
> Jan 13 01:09:42 loke kernel: [__fput+61/228] __fput+0x3d/0xe4
> Jan 13 01:09:42 loke kernel: [filp_close+89/95] filp_close+0x59/0x5f
> Jan 13 01:09:42 loke kernel: [sysenter_past_esp+82/117] sysenter_past_esp+0x52/0x75
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/