Re: Silent corruption on AMD64

From: Andi Kleen
Date: Sun Apr 01 2007 - 09:00:49 EST


Aaron Lehmann <aaronl@xxxxxxxxxxx> writes:

[adding netdev]
[meta-comment: I wish people wouldn't use such unnecessarily broad subjects
-- how is it the x86-64 port's or AMD's fault when you have broken hardware?
Would anybody write "Silent corruption on i386" or "Silent corruption
on Intel" or "Silent corruption on Linux"?]

> On Sat, Mar 31, 2007 at 08:03:16PM -0700, Jim Paris wrote:
> > Since it shows up under heavy load that includes unrelated devices, I
> > think ruling out hardware problems is important. Some suggestions:
>
> I've been able to narrow it down to the Realtek Ethernet card. I can't
> reproduce the problem using onboard Ethernet, whereas the Realtek card
> causes trouble in any slot. However, I still don't know whether it's a
> hardware or software issue, or whether it's caused directly or
> indirectly by the Realtek card.

You could disable the hardware checksumming support in the card with
the appended patch. Then hopefully Linux will catch most corruptions
(but perhaps not all because TCP checksums are not very strong)
You can watch failed checksums then with netstat -s

-Andi

Index: linux-2.6.21-rc3-net/drivers/net/r8169.c
===================================================================
--- linux-2.6.21-rc3-net.orig/drivers/net/r8169.c
+++ linux-2.6.21-rc3-net/drivers/net/r8169.c
@@ -2477,6 +2477,7 @@ static inline int rtl8169_fragmented_fra

static inline void rtl8169_rx_csum(struct sk_buff *skb, struct RxDesc *desc)
{
+#if 0
u32 opts1 = le32_to_cpu(desc->opts1);
u32 status = opts1 & RxProtoMask;

@@ -2485,6 +2486,7 @@ static inline void rtl8169_rx_csum(struc
((status == RxProtoIP) && !(opts1 & IPFail)))
skb->ip_summed = CHECKSUM_UNNECESSARY;
else
+#endif
skb->ip_summed = CHECKSUM_NONE;
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/