sx8 on x86-64 instability with kernels >= 2.6.16
From: David Dumas
Date: Thu Oct 12 2006 - 11:51:39 EST
I have a dual opteron system with an sx8, and have observed repeatable
sx8 failures under x86-64 linux 2.6.16 and later. The problems are
not observed on x86-64 kernels 2.6.15 and earlier, nor on _any_ i386
kernel.
I first noticed the problem when trying to build a md RAID on the sx8.
The resync always stops after < 2 seconds (at different points, but
always early). Stats in /proc/mdstat continue to update, but show no
progress, and mdadm --stop causes a soft lockup. Note that building a
RAID consisting of disks on the onboard SATA controller works fine.
The problem is not specific to md however. While a single instance of
bonnie++ typically succeeds, running two simultaneous instances on
different disks connected to the sx8 always results in one of them
freezing in disk wait. Sometimes a soft lockup occurs. I have never
observed any error messages in syslog or on the console.
The dependence of the problem on specific kernel versions and the
x86-64 architecture seems to rule out hardware issues (though I tried
replacing cables, reseating card, etc.).
As for kernel configuration, I have tried both debian stock amd64
kernels and a custom kernel configured as close to the working 2.6.15
kernel as possible. The problem is the same in each case. I always
keep the sx8 "max_queue" parameter at the safe value of 1, so this is
not related to the instability observed by the driver developer for
larger queues.
This is especially perplexing since the sx8 driver was only subject to
trivial patches in the 2.6.16 release:
[PATCH] md: remove slashes from disk names when creation dev names in sysfs
[PATCH] mutex subsystem, semaphore to completion: SX8
Perhaps a seemingly unrelated patch affects the sx8 driver, but only
on x86-64? And advice or suggestions would be appreciated.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/