Re: Oops in 2.3.X, 2.2.X, and 2.0.3X with FENRIS and NCR53c875 SCSI Driver

Jeff Merkey (jmerkey@timpanogas.com)
Thu, 10 Jun 1999 16:55:17 -0600


Gerard,

I was not spending a lot of time with this right now becuase I was trying to
get FENRIS tested on as much hardware as possible prior to dumping the code
to our FTP server, but I will have some time tommorow to look at this in
more depth if this would be helpful after I put the tarball up on our ftp
server. What do you want me to look at next?

Jeff

----- Original Message -----
From: Jeff Merkey <jmerkey@timpanogas.com>
To: Gerard Roudier <groudier@club-internet.fr>;
<linux-kernel@vger.rutgers.edu>
Sent: Thursday, June 10, 1999 4:51 PM
Subject: Re: Oops in 2.3.X, 2.2.X, and 2.0.3X with FENRIS and NCR53c875 SCSI
Driver

>
> Gerard,
>
> I am using an American Arium Logic analyzer (leased). The bus trace shows
> an errant DMA from a busmastering device that trashed memory in my system
(I
> was running a bus trace to track this down). The bus activity indicates
> that a device mapped to a very odd address in the system memory map
> (0xB0000000) was doing something when the DMA occurred. I'm an old
hardware
> guy who converted to software, but I'm very proficient with hardware
> analysis tools, in fact, most of my SMP debugging is done with a logic
> analyzer with an inverse assembler. I also wrote SCSI scripts years back
> for the NCR 53C720 (we used SCSI for clustering) so I'm pretty familiar
with
> "buggy" SCSI scripts and I've seen this type of behavior before. There
was
> also a reset on the SCSI device which would indicate a reselction was in
> process (saw a test unit ready and inquiry command with a status of '4'
> being returned from the device if this is helpful.[also have a SCSI
> analyzer(leased)]) .
>
> I have not looked through your scripts for the driver and don't have a
> scripts compiler (does one ship with Linux)-- but could. Should I do this
> next?
>
> Jeff
>
> ----- Original Message -----
> From: Gerard Roudier <groudier@club-internet.fr>
> To: Jeff Merkey <jmerkey@timpanogas.com>
> Sent: Thursday, June 10, 1999 3:34 PM
> Subject: Re: Oops in 2.3.X, 2.2.X, and 2.0.3X with FENRIS and NCR53c875
SCSI
> Driver
>
>
> >
> >
> > On Thu, 10 Jun 1999, Jeff Merkey wrote:
> >
> > > Alan,
> > >
> > > None of the other setups produce the "..could net get a free page.."
> > > message. This looks like a subtle race condition with this particular
> scsi
> > > driver. I will try some other scenarios.
> >
> > Dear Jeff,
> >
> > The first thing you should want to do prior to posting any diagnostic of
> > yours to the kernel list could be to, at least, check that you are not
> > just annoucing some triviality or claiming some stupidity. Note that I
> > didn't see any triviality in your postings.
> >
> > The second thing it to provide maintainers with relevant traces, logs,
> > source code, scripts that triggers the problem, accurate description of
> > the problem, etc ... Your opinion may be interesting, but only if is
based
> > on relevant observations, and, by the way, something that is just
claimed
> > as "looking like subtle ..." has nothing to do with relevance, in my
> > opinion.
> >
> > Now, if you really are quite sure that the SCSI driver is the thing that
> > breaks your system, I invite you to take biggest possible gun and and
shot
> > it immediately.
> >
> > I am the maintainer of some driver that you may have used on your
system.
> > Most of reports from people that experience problems when using such
> > drivers generally accused the driver first. The reality _is_ that most
> > of the time, the problem has been proved to be elsewhere.
> >
> > A kernel list is public. When you claim 'I think something is broken'
> > in public, then, a not negligible number of readers may just beleive you
> > or just want to beleive you.
> >
> > Report the problem as you observe it, please, and avoid comments that
> > are just bare claims. Thanks.
> >
> > Btw, the URL that hosts driver updates is the following:
> > ftp://ftp.tux.org/pub/roudier/drivers/
> >
> > Gérard.
> >
> > >
> > > Jeff
> > >
> > >
> > > ----- Original Message -----
> > > From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> > > To: Jeff Merkey <jmerkey@timpanogas.com>
> > > Cc: <jmerkey@timpanogas.com>; <linux-kernel@vger.rutgers.edu>
> > > Sent: Wednesday, June 09, 1999 5:32 PM
> > > Subject: Re: Oops in 2.3.X, 2.2.X, and 2.0.3X with FENRIS and
NCR53c875
> SCSI
> > > Driver
> > >
> > >
> > > > > Also shows up single processor (different machine, same chipset
> 53C875).
> > > =
> > > > > Looks like a possible SCSI Scipts error. FENRIS, unlike UNIX
FS's
> will
> > > =
> > > > > talk to more than one disk or partition concurrently since a
volume
> can
> > > =
> > > > > have multiple segments strip[ed across several drives. May be
> exposing
> > > =
> > > > > some timing condition with the way I am calling breada abd bread.
> > > >
> > > > The raid code already does that kind of concurrency. Now the
important
> > > clue
> > > > is probably
> > > >
> > > > > More on this -- am also getting "..could not get a free
page...".
> =
> > > >
> > > > This indicates a memory allocation failed. That means its more
likely
> > > someone
> > > > doesnt check a get_free_page/kmalloc return and continues blissfully
> into
> > > > oblivion mode.
> > > >
> > > > Could be the driver fs or scsi layer. Are your other test setups
> producing
> > > > a "free page..." message too ? (ie the working ones)
> > > >
> > > >
> > >
> > >
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe
linux-kernel"
> in
> > > the body of a message to majordomo@vger.rutgers.edu
> > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > >
> >
> >
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/