Re: TCPv4 bad checksum - weren't they gone?

Kevin Buhr (buhr@stat.wisc.edu)
08 Jul 1998 18:23:24 -0500


peloy@ven.ra.rockwell.com writes:
>
> I've just got this:
>
> TCPv4 bad checksum from 130.151.17.154:1f90 to 130.151.17.162:0406,
> len=1206/1206/1226
>
> I thought these were gone...
>
> I heard some time ago DaveM saying that these were because of buggy
> implementations of Van-Jacobsen compression or something like that,
> and that the buggy implementations were third party routers, access
> servers or Windows boxes (please correct me if I am wrong).

While this might be one cause, it is certainly not necessarily so.
And judging from the reponses you received on the list, people are
pretty confused about how VJ header compression works.

In a nutshell, VJ compression relies on the TCP checksums to maintain
synchronization in the presence of dropped packets. In practice, if
you get a bad PPP frame (for example, if your kernel or bus are too
swamped to keep up with the modem), it will be discarded at the PPP
level because of its bad PPP frame CRC. (And, yes, that's a CRC and
not a checksum.) But, unless you've enabled an appropriate "kdebug"
flag on the PPP daemon, you won't get a kernel message at all; you
have to check the "RX errors" in the output of "ifconfig ppp0" to see
these errors occurring.

However, this dropped packet means the compression stream is now out
of synch. Therefore, the next packet, if it arrives undamaged, will
pass the PPP CRC test and, typically, fail the TCP checksum test
because the loss of synchronization leads to an incorrect regeneration
of the TCP header. The sending PPP daemon notices that packets are
being resent, assumes loss of synchronization has occurred, and sends
an uncompressed packet to resynchronize. This is how it's *supposed*
to work. Those interested in the details should consult RFC 1144,
section 4, "Error Handling".

Yes, disabling VJ compression will make the messages go away, but you
haven't made the *problem* go away. You are still, presumably,
dropping PPP frames for some reason, and the packets they contain must
be resent. You'll just be slaughtering your interactive connection
performance for no reason.

The right thing to do is to find out *why* you are dropping PPP
frames. If you've got a cheap modem without a 16550 UART or if you've
got a flakey bus, you might just have to live with the problem until
you can replace the hardware. Try reducing the baud rate in the
meantime; you'll have to do some benchmarking to determine the best
tradeoff between "being slow" and "being accurate". If you discover
you're inadvertently sharing interrupts with your serial mouse (so
*that's* why moving my mouse causes checksum errors!), you can take
some remedial action.

Or, the cause may be more obscure. I recently installed a new, fast
IDE hard drive on a very slow, obsolete 486/33 with an el-cheapo
motherboard. I immediately received an absolute avalanche of checksum
messages which could be reduced, but not eliminated, using a lower
baud rate out to the modem. It turned out that data transfers with
the new, fast drive were causing interrupts to be masked for way too
long. In this case, a "hdparm -u 1" on the device fixed the problem,
and I don't think I've had a checksum error since.

Good luck!

Kevin <buhr@stat.wisc.edu>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu