Re: wrong data byte #8 should be 0x8 but was...

From: Ragnar Kjørstad (kernel@ragnark.vestdata.no)
Date: Sat Feb 19 2000 - 09:02:39 EST


On Fri, Feb 18, 2000 at 08:28:43PM +0100, Jesper Juhl wrote:
> Hi again,
>
> I just found some more info that may (or may not) help to identify the
> source of the problem outlined in my previous post (quoted below).
> I just typed "dmesg" and saw the following at the end of the listing:
>
> ping(776): unaligned trap at 0000000120002b04: 0000000120118c24 29 1
> ping(776): unaligned trap at 0000000120002b2c: 0000000120118c1c 29 2
> traceroute(785): unaligned trap at 0000000120002a9c: 000000012010613c 2d 6
> traceroute(785): unaligned trap at 0000000120002aa0: 0000000120106134 2d 5
>
> I guess the above is related to the problem, but I don't know what it means.

I'm on thin ice here, but I think the error is caused by programs trying
to access a long that is not aligned on a /8 memory-adress. I believe
the kernel automagicly avoids the problem some how - and that the
messages are just warnings telling you you should fix the program.

Anyways - we had lots of unaligned trap messages on our Alpha-linux
(from all kinds of programs), and it didn't cause any problems.

However, we have had the "wrong byte #8" problem on several machines,
running different kernel-versions(2.2.5 and 2.2.9), with different
network-cards (3com 509, 3com 905 and Intel) in different locations.
The only common factor is that they are all running linux.

According to some documentation "wrong byte #8" is hardware problem in
network card, switch or something. However, I find that highly
improbably in our case because it happens lots of different machines
(and in different nets).

Once in a while all network activity stops, and if we try ping we get
the "wrong byte #8" on all packets. The problem remains for a long time,
unless we do (ifconfig eth0 down; ifconfig eth0 up) - then the problem
stops. Tis leads me to believe the problem is software related.

My hunch is that some hardware problem cause the linux-kernel to get in
the wrong "state", so that the problem remains until the interface is
taken down and up again.

Does anyone have suggestions for debugging I can do?

--
Ragnar Kjørstad

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Feb 23 2000 - 21:00:23 EST