Re: Trivial hard lockup, SCSI, 2.4.23

From: Ian Soboroff
Date: Thu Dec 18 2003 - 09:23:39 EST


Marcelo Tosatti <marcelo.tosatti@xxxxxxxxxxxx> writes:

> On Tue, 16 Dec 2003, Ian Soboroff wrote:
>
>>
>> I've found that I can lock a machine running 2.4.23aa1 by trying to
>> access a nonexistent SCSI device. In other words, if a userspace
>> program tries to access /dev/sdd, but no device is attached on any
>> SCSI bus using that device node, the machine locks hard.
>>
>> We found this when we disconnected a SCSI hardware RAID from a server,
>> but forgot to remove the cron job which checked its status.
>>
>> The lockup leaves no errors whatsoever in the logs. I finally tracked
>> it down with the NMI watchdog.
>
> What did the NMI oopser report ?

It didn't get logged, so I don't have the full trace, but I remember
that it indicated a program we run called raidm, which checks the
status of our RAIDs periodically. We'd forgotten to gun the instance
which was watching the RAID we disconnected.

Running raidm on a connected RAID, I can see that it opens the device
and sends some ioctls:

...
stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0
open("/dev/sda1", O_RDONLY) = 3
ioctl(3, FIBMAP, 0xbfff4840) = 0
ioctl(3, FIBMAP, 0xbfff4840) = 0
ioctl(3, FIBMAP, 0xbfff4910) = 134217730
ioctl(3, FIBMAP, 0xbfff4910) = 134217730
ioctl(3, FIBMAP, 0xbfff4910) = 134217730
ioctl(3, FIBMAP, 0xbfff4910) = 134217730
ioctl(3, FIBMAP, 0xbfff4910) = 134217730
ioctl(3, FIBMAP, 0xbfff4910) = 134217730
time(NULL) = 1071757056
open("/var/log/raidm.log", O_WRONLY|O_APPEND|O_CREAT, 0666) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=5049303, ...}) = 0
...

I can't afford to kill the machine again, otherwise I'd trigger the
oops again. Shouldn't scsi or aic7xxx not let me open(2) the device
if nothing's attached?

Ian

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/