Mylex dac960 not SMP-safe?

From: Olli Lounela (olli@mpoli.fi)
Date: Mon Feb 12 2001 - 04:27:52 EST


Hi all!

I'm having mucho trouble trying to run any kind of kernel in the following
hardware:

    - Intel PR440FX, latest BIOS (1.00.09DI)
    - 2 * Intel PPro/200, stepping 09
    - Onboard Intel eepro100/B with 82557, driver in 2.2.18 reports
      assembly 645520-034
    - Onboard Adaptec 7880
    - Mylex AcceleRAID 250 (code DACPTLM-1), 8 MB ECC cache SIMM.
    - 3 * IBM DNES-309170 9 GB LVD-SCSI disks in RAID-5 setup (Mylex
      recognizes them as LVD)

Basically, there's something there that just locks the machine up.

Apparently the keyboard controller is peculiar, or else any of the keyboards
I have access to can't produce SysRQ, since SysRQ key just produces four raw
scancodes, not one, and the documentation doesn't say how to handle this
case. I'd be much happier to force an OOPS and add the trace here.

The main idea is to boot from the dac960, and remove any old disks still
attached to the aic7880. The dac960 is in a butmastering slot, and I did try
removing the busmastering jumper. Main memory is ECC, and has been swapped
(the SIMMs available are 64 MB and 128 MB, one each).

Symptoms with booting 2.4-kernels from dac960

  - Booting with SMP

    * Both 2.4.1-ac9 and 2.4.2-pre3 hang at dac960 initialization

  - Booting SMP-compiled kernel with nosmp option

    * dac960 gets initialized, but the builtin eepro/100b hangs at once

Nothing is produced in the log, and the machine just stops. To me it looks
like a deadlock. Getting the machine to react in any way requires hardware
reset. Softdog doesn't react, so apparently the kernel is still running, to
a degree at least.

With newer 2.2-kernels (like 2.2.19pre9) the machine works for a while, and
then hangs at random, apparently from network traffic. This also occurs with
stock RedHat 2.2.16-3.

The machine has formerly run linux faultlessly for a long time (years),
without the dac960. It's pretty hard to find out the kernel version for sure,
but apparently 2.2.14 has worked from 10 March 2000 till Dec 2000.
Accordingly, while the dac960 card itself may be the culprit, I strongly
suspect the dac960 driver is not SMP-safe.

Booting from aic7880, the system works with 2.2.18 kernel (with dac960
driver), and I can compile kernel with 'make -j 20', but the machine still
hangs in a few minutes if I try to simultaneously fetch 2.4.1 from mirror.
The latter hanged the even with RH 2.2.16-3 (2.2.19pre9 was a bit better but
did the same). What with the dismaying results above, I haven't tested
2.4-kernels booting from the aic7880.

With most any 2.2-kernel, when booting with nosmp or UP-compiled kernel I
seem to be able fetch and compile a new kernel. Booting 2.2.14-SMP without
dac960 driver, fetching 2.4.1 while compiling kernel with -j15 did not hang
the machine.

Apparently there's some strange interaction with SMP PPro and eepro100/B and
dac960 drivers. I'm a bit at a loss on how to approach the problem, any help
will be appreciated.

lspci -vvvxx attached.

Oh, and I read the list from a www-archive, so please CC me.

-- 
    Olli               ...and he thought I'm serious! Hahahaha...


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 21:00:18 EST