Re: Oops in 2.3.X, 2.2.X, and 2.0.3X with FENRIS and NCR53c875 SCSI Driver

Jeff Merkey (jmerkey@timpanogas.com)
Thu, 10 Jun 1999 16:51:53 -0600


Gerard,

I am using an American Arium Logic analyzer (leased). The bus trace shows
an errant DMA from a busmastering device that trashed memory in my system (I
was running a bus trace to track this down). The bus activity indicates
that a device mapped to a very odd address in the system memory map
(0xB0000000) was doing something when the DMA occurred. I'm an old hardware
guy who converted to software, but I'm very proficient with hardware
analysis tools, in fact, most of my SMP debugging is done with a logic
analyzer with an inverse assembler. I also wrote SCSI scripts years back
for the NCR 53C720 (we used SCSI for clustering) so I'm pretty familiar with
"buggy" SCSI scripts and I've seen this type of behavior before. There was
also a reset on the SCSI device which would indicate a reselction was in
process (saw a test unit ready and inquiry command with a status of '4'
being returned from the device if this is helpful.[also have a SCSI
analyzer(leased)]) .

I have not looked through your scripts for the driver and don't have a
scripts compiler (does one ship with Linux)-- but could. Should I do this
next?

Jeff

----- Original Message -----
From: Gerard Roudier <groudier@club-internet.fr>
To: Jeff Merkey <jmerkey@timpanogas.com>
Sent: Thursday, June 10, 1999 3:34 PM
Subject: Re: Oops in 2.3.X, 2.2.X, and 2.0.3X with FENRIS and NCR53c875 SCSI
Driver

>
>
> On Thu, 10 Jun 1999, Jeff Merkey wrote:
>
> > Alan,
> >
> > None of the other setups produce the "..could net get a free page.."
> > message. This looks like a subtle race condition with this particular
scsi
> > driver. I will try some other scenarios.
>
> Dear Jeff,
>
> The first thing you should want to do prior to posting any diagnostic of
> yours to the kernel list could be to, at least, check that you are not
> just annoucing some triviality or claiming some stupidity. Note that I
> didn't see any triviality in your postings.
>
> The second thing it to provide maintainers with relevant traces, logs,
> source code, scripts that triggers the problem, accurate description of
> the problem, etc ... Your opinion may be interesting, but only if is based
> on relevant observations, and, by the way, something that is just claimed
> as "looking like subtle ..." has nothing to do with relevance, in my
> opinion.
>
> Now, if you really are quite sure that the SCSI driver is the thing that
> breaks your system, I invite you to take biggest possible gun and and shot
> it immediately.
>
> I am the maintainer of some driver that you may have used on your system.
> Most of reports from people that experience problems when using such
> drivers generally accused the driver first. The reality _is_ that most
> of the time, the problem has been proved to be elsewhere.
>
> A kernel list is public. When you claim 'I think something is broken'
> in public, then, a not negligible number of readers may just beleive you
> or just want to beleive you.
>
> Report the problem as you observe it, please, and avoid comments that
> are just bare claims. Thanks.
>
> Btw, the URL that hosts driver updates is the following:
> ftp://ftp.tux.org/pub/roudier/drivers/
>
> Gérard.
>
> >
> > Jeff
> >
> >
> > ----- Original Message -----
> > From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> > To: Jeff Merkey <jmerkey@timpanogas.com>
> > Cc: <jmerkey@timpanogas.com>; <linux-kernel@vger.rutgers.edu>
> > Sent: Wednesday, June 09, 1999 5:32 PM
> > Subject: Re: Oops in 2.3.X, 2.2.X, and 2.0.3X with FENRIS and NCR53c875
SCSI
> > Driver
> >
> >
> > > > Also shows up single processor (different machine, same chipset
53C875).
> > =
> > > > Looks like a possible SCSI Scipts error. FENRIS, unlike UNIX FS's
will
> > =
> > > > talk to more than one disk or partition concurrently since a volume
can
> > =
> > > > have multiple segments strip[ed across several drives. May be
exposing
> > =
> > > > some timing condition with the way I am calling breada abd bread.
> > >
> > > The raid code already does that kind of concurrency. Now the important
> > clue
> > > is probably
> > >
> > > > More on this -- am also getting "..could not get a free page...".
=
> > >
> > > This indicates a memory allocation failed. That means its more likely
> > someone
> > > doesnt check a get_free_page/kmalloc return and continues blissfully
into
> > > oblivion mode.
> > >
> > > Could be the driver fs or scsi layer. Are your other test setups
producing
> > > a "free page..." message too ? (ie the working ones)
> > >
> > >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
> > the body of a message to majordomo@vger.rutgers.edu
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/