Re: [OOPS] 2.6.23-rc5 in tcp/net/nfsd

From: Neil Brown
Date: Tue Sep 11 2007 - 05:27:14 EST


On Tuesday September 11, mark@xxxxxxxxxxxxxx wrote:
> This oops appeared over night on a box running 2.6.23-rc5 (recent with the
> tcp_input.c fix).
>
> I can't find a similar one reported.


Okay..... this is weird.
>
> Mark
>
>
> BUG: unable to handle kernel NULL pointer dereference at virtual address 0000007e
^^^^^^^^

That is the bad address,

> EFLAGS: 00010246 (2.6.23-rc5-2-mcyrixiii #1)
> EIP is at ip_fragment+0x7f/0x680
> eax: c3c09c00 ebx: 00000000 ecx: b524d006 edx: 0000007b
^^^^^^^^^^^^^

It looks like an offset of 3 from edx. I got that from decoding:

> Code: .... <00> ba 03 00 00 00 bf a6 ff ff

which is
0: 00 ba 03 00 00 00 add %bh,0x3(%rdx)

However that instruction doesn't appear in ip_fragment.
The code in ip_fragment reads:
27: b9 04 00 00 00 mov $0x4,%ecx
^^
2c: ba 03 00 00 00 mov $0x3,%edx
^^^^^^^^^^^^^^

which contains the bytes of the offending instruction.
Note that $0x4 is ICMP_FRAG_NEEDED and $0x3 is ICMP_DEST_UNREACH:
these are args to icmp_send. So the latter is the correct disassembly
based on the C code.

So somehow the kernel is jumping to a bad address. I don't know how
that would happening.... maybe a single bit error in memory or a
register???

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/