Re: lots of 2.2.4 oopses

jhosein@razorfish.com
Fri, 26 Mar 1999 11:24:23 -0500


I don't think it's egcs, I got this last night from our main file
server, we had just upgraded it to 2.2.4 around midnight. It was under
light load and in the process of dumping a 35GB ext2 filesystem to tape.
The machine is a Dual Pentium II 300 with 256MB of RAM.

[root@phoenix log]# uname -a
Linux phoenix 2.2.4 #2 SMP Thu Mar 25 20:21:36 EST 1999 i686 unknown
[root@phoenix log]# cat /proc/version
Linux version 2.2.4 (root@water) (gcc version 2.7.2.3) #2 SMP Thu Mar 25
20:21:36 EST 1999

Mar 26 04:07:02 phoenix kernel: Unable to handle kernel paging request
at virtual address 5a6b5247
Mar 26 04:07:02 phoenix kernel: current->tss.cr3 = 067da000, pr3 =
067da000
Mar 26 04:07:02 phoenix kernel: *pde = 00000000
Mar 26 04:07:02 phoenix kernel: Oops: 0000
Mar 26 04:07:02 phoenix kernel: CPU: 1
Mar 26 04:07:02 phoenix kernel: EIP: 0010:[<c0129502>]
Mar 26 04:07:02 phoenix kernel: EFLAGS: 00010216
Mar 26 04:07:02 phoenix kernel: eax: 5a6b5247 ebx: 0000b62a ecx:
c67d0808 edx: 5a6b5247
Mar 26 04:07:02 phoenix kernel: esi: 00000400 edi: 0000b62a ebp:
00000808 esp: c67d9e10
Mar 26 04:07:02 phoenix kernel: ds: 0018 es: 0018 ss: 0018
Mar 26 04:07:02 phoenix kernel: Process sort (pid: 1318, process nr: 95,
stackpage=c67d9000)
Mar 26 04:07:02 phoenix kernel: Stack: c0129531 00000808 0000b62a
00000400 c0129797 00000808 0000b62a 00000400
Mar 26 04:07:02 phoenix kernel: 0000b62a c359f320 c67d9f14
0000b62a c7b2d780 c0140f9d 00000808 0000b62a
Mar 26 04:07:02 phoenix kernel: 00000400 00000000 0000b62a
00000001 c359f320 00000008 c01415fd c359f320
Mar 26 04:07:02 phoenix kernel: Call Trace: [<c0129531>] [<c0129797>]
[<c0140f9d>] [<c01415fd>] [<c01418a9>] [<c013fbbe>] [<c013f9d4>]
Mar 26 04:07:02 phoenix kernel: [<c011d4b5>] [<c0105000>]
[<c011daca>] [<c010fd4e>] [<c0127f99>] [<c0109aec>]
Mar 26 04:07:02 phoenix kernel: Code: 8b 12 39 58 04 75 f3 39 70 08 75
ee 66 39 48 0c 75 e8 89 c2

Any thoughts?
-jdh

torvalds@transmeta.com (Linus Torvalds) Wrote:
> In article <36F92EBB.B33E673E@pobox.com>,
> Valient Gough <vgough@pobox.com> wrote:
> >
> >I upgraded from 2.2.3 to 2.2.4 last night. Since then I've had to
> >lockups. The second one managed to log "Unable to handle kernel paging
> >request at virtual address 00de3f9c" before it locked up, but I didn't
> >have the correct System.map file in place, so klogd didn't have symbols.
> >
> >But, this morning, I had 4 such oops in a row, with a correct
> >System.map.
>
> Ok, they are all basically the same oops, the file pointer list has
> gotten corrupted somehow, and that corruption will result in an oops
> whenever the kernel tries to touch one of the file pointers.
>
> The oops happens in remove_filp() (which is an inline function,
> explaining why the symbol table says it is in get_empty_filp()).
>
> The code looks like
>
> if(file->f_next)
> file->f_next->f_pprev = file->f_pprev;
> *file->f_pprev = file->f_next;
>
> and the thing oopses apparently because file->f_next is just crap (it
> appears to have the value 0x2c302c35, to be exact - seems to be the
> string "5,0," in fact).
>
> I would _much_ prefer it if you could try to compile 2.2.4 with
> gcc-2.7.2. We now have two reports of an oops like this, and in both
> cases egcs was used as the compiler. And egcs is doing alias analysis
> these days - and following the ANSI rules about aliases is something
> that we haven't validated that the kernel is doing correctly (and that's
> assuming that egcs is bug-free, which is obviously also a potential
> issue).
>
> Basically, right now I don't know whether this is (a) a real kernel bug
> (b) the kernel having some place it is not alias analysis aware and
> doing something ugly that breaks with egcs or (c) an egcs bug.
>
> Compiling with gcc-2.7.2 would at least resolve whether it is the
> compiler that induces the effect..
>
> Linus

-- 
 |   ||  |||  ||           r  a  z  o  r  f  i  s  h  , new york

jinnah dylan hosein [ systems administrator ]

>> (1) 212.966.5960

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/