Hardware bug or kernel bug?

From: David Johnson
Date: Thu Oct 12 2006 - 12:53:52 EST


Hi,

I'm having a major problem on a system that I've been unable to track down.
When using scp to transfer a large file (a few gig) over the network
(@100Mbit/s) the system will reboot after about 5-10 minutes of transfer. No
errors, just a reboot. I have another identical system which exhibits the
same behaviour.

The system is a Supermicro P4SCT+ with a hyperthreading P4. I've posted the
dmesg here:
http://www.david-web.co.uk/download/dmesg

I initially tried a different NIC in case that was at fault, but the results
were the same.

Changing the interrupt timer frequency in the kernel makes a difference:
100Hz - system reboots instantly when transfer is started
250Hz - reboots after a few seconds
1000Hz - reboots after 5-10 minutes

As the problem appears to be interrupt-related, I disabled the I/O APIC in the
BIOS (after first having to disable hyperthreading) which resulted in the
system lasting a bit longer before it reboots. I then tried disabling the
Local APIC as well but this made no difference.

I've tested with Centos' 2.6.9 kernel and with a vanilla 2.6.17.13 kernel and
the results are the same with both.

Does anyone have any idea whether this is likely to be a hardware problem or a
kernel problem?
Any suggestions for more ways to debug this would be greatfully received.

Thanks,
David.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/