Re: sata io freeze, 2.6.18 and also 2.6.24 [the real culprit]

From: Luis Sousa
Date: Thu Dec 25 2008 - 04:20:23 EST


Dear Linux developers,

I carried this thread on in April 2008 because my computer was crashing every other day. I tried and tried and thought I had reached a solution many a time, but eventually they all proved illusive. Eventually the box kept crashing every other day, applications often dumped core and the only SysRq combination to work was the B one.

But then my motherboard burned out. So I went and bought another one. And magically all my problems disappeared. Interestingly enough, the temperature seems to have gone down at least 15°C.

I'm not sure if this can still be a kernel issue, but it definitely seems not to me. I had installed FreeBSD on the system back then and it would crash also. Anyway, my faith in Linux has risen up once again. This system seems rock solid now, it's been a while.

I'd like to say thanks to all, and I'm glad to have such a great stable system once again. Thank you for developing this incredible system, it is definitely a superior system for me, and "superior" in this case is a self-sufficient term.

Best regards,
--
Luis Sousa

--- Em seg, 14/4/08, Luis Sousa <ls.luis_sousa@xxxxxxxxxxxx> escreveu:
De: Luis Sousa <ls.luis_sousa@xxxxxxxxxxxx>
Assunto: Re: sata io freeze, 2.6.18 and also 2.6.24 [the real culprit]
Para: devzero@xxxxxx
Cc: jeff@xxxxxxxxxx, linux-ide@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
Data: Segunda-feira, 14 de Abril de 2008, 22:24

--- devzero@xxxxxx escreveu:

> so - you told, you have your disk attached to this controller:
>
> 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID
Controller (rev
> 80)
> Subsystem: Micro-Star International Co., Ltd. Unknown device 0430
> Flags: bus master, medium devsel, latency 32, IRQ 16
> I/O ports at ec00 [size=8]
> I/O ports at e800 [size=4]
> I/O ports at e400 [size=8]
> I/O ports at e000 [size=4]
> I/O ports at dc00 [size=16]
> I/O ports at d800 [size=256]
> Capabilities: <access denied>
>
> and use sata_via module - and your system works stable with nosmp, but
freezes very soon
> on heavy i/o with smp. disabling dma makes no difference.
>
> correct ?

Correct.

But I'm not sure it's an exclusive i/o thing. I've had the box
crash in
situations where I don't think heavy i/o is involved.. for instance, it
crashed two times just as I closed the iceape browser.

> then i assume this could be an smp issue, probably a bug in sata_via and
may need further
> investigation.
>
> there seem no noticeable changes in that driver since 2.6.24 - but maybe
it`s worth
> trying 2.6.25-rc9 to see if it behaves different?

I'm actually running 2.6.25-rc7-git6 now.
(so I really don't think that would make much of a difference)

> absolutely nothing in dmesg on freeze ?

Absolutely nothing. Never.

> maybe you could try catching some message with attaching serial console?

Hmm, I have no serial consoles here, and my spare box is dead.

Unless I can attach netconsole to a windows box.

I can probably crash it in console mode, without Xorg. Then I hope something
might show up in another vt at least.

> how reliably can you crash that box with lots of i/o ? just random crashes
or after a
> certain amount of data written/read ?

There is definitely a random element in it. But I can reliably crash it.
Just trigger some CPU and i/o bound processes. I've crashed it with
operations
which would only read from the disk, for instance unpacking to /dev/null.
Sometimes it takes longer, though.

> could you try if just reading from the raw blockdevice makes a differnce ?
(dd
> if=/dev/sda of=/dev/zero)

Yes, I can try that. But please bear with me as I won't be working full
time
in this. I hope we can get to the bottom of this, but it might not be as quick
as we'd like.

Thank you very much for the help.

I'll write back asap.

Best regards,
--
Luis Sousa

> > -----Ursprüngliche Nachricht-----
> > Von: "Luis Sousa" <ls.luis_sousa@xxxxxxxxxxxx>
> > Gesendet: 14.04.08 12:14:26
> > An: devzero@xxxxxx
> > CC: linux-kernel@xxxxxxxxxxxxxxx
> > Betreff: Re: sata io freeze, 2.6.18 and also 2.6.24 [the real
culprit]
>
>
> >
> > Hello,
> >
> > I did as planned, and I dare to say now, with
> > relative certainty, that the reason I have had all
> > these instabilities, including application crashes,
> > sneaky corrupt backups, and the dreaded freezes
> > every other day was SMP and HyperThreading not working
> > properly. Booting with nosmp gives me a stable system.
> > Booting only with the sata-related arguments had
> > the system freeze overnight.
> >
> > Sincerely,
> > --
> > Luis Sousa
> >
> > --- devzero@xxxxxx escreveu:
> >
> > > hi luis,
> > >
> > > glad that it worked - but let us better call this a possible
workaround, not a
> solution.
> > >
> > > mind that you loose lots of performance with this, because all
i/o is handled by the
> cpu
> > > now.
> > >
> > > > If everything goes well, I'll reduce the command line
> > > > in order to narrow the problem further down.
> > >
> > > yes, please!
> > >
> > > regards
> > > roland
> > >
> > >
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: "Luis Sousa"
<ls.luis_sousa@xxxxxxxxxxxx>
> > > > Gesendet: 12.04.08 17:34:39
> > > > An: devzero@xxxxxx
> > > > CC: linux-kernel@xxxxxxxxxxxxxxx
> > > > Betreff: Re: sata io freeze, 2.6.18 and also 2.6.24
[POSSIBLE SOLUTION]
> > >
> > >
> > > >
> > > > Hello all,
> > > >
> > > > Thanks to valuable help from roland, I was able to
> > > > boot with the following options:
> > > >
> > > > ``nosmp ide=nodma libata.dma=0''
> > > >
> > > > after what I ran an agressive stability test for a
> > > > while, with no problem whatsoever. Maybe it's too
> > > > soon to say, but I have a good feeling this time. It
> > > > seems roland figured it out.
> > > >
> > > > If everything goes well, I'll reduce the command line
> > > > in order to narrow the problem further down.
> > > >
> > > > (and indeed, this is a hyper-threading single core CPU)
> > > >
> > > > Sincerely,
> > > > --
> > > > Luis Sousa
> > > >
> > > > --- devzero@xxxxxx escreveu:
> > > >
> > > > > did you try UP-kernel or tried booting with
"nosmp" and does that make any
> difference
> > > ?
> > > > >
> > > > > >This is a 2-processor CPU:
> > > > > ># cat /proc/cpuinfo
> > > > > >processor : 0
> > > > > >vendor_id : GenuineIntel
> > > > > >cpu family : 15
> > > > > >model : 4
> > > > > >model name : Intel(R) Pentium(R) 4 CPU
3.00GHz
> > > > >
> > > > > most likely this just "appears" as 2-way
processor , but i´m quite sure this is
> just
> > > a
> > > > > single-core CPU, but with HyperThreading enabled.
> > > > >
> > > > >
> > > > > regards
> > > > > roland
> > > > >
> > > > >
> > > > >
> > > > > List: linux-kernel
> > > > > Subject: sata io freeze, 2.6.18 and also 2.6.24
> > > > > From: Luis Sousa <ls.luis_sousa () yahoo !
com ! br>
> > > > > Date: 2008-03-28 22:24:48
> > > > > Message-ID: 739784.81690.qm () web46009 ! mail ! sp1 !
yahoo ! com
> > > > > [Download message RAW]
> > > > >
> > > > > Hello,
> > > > >
> > > > > I've been having consistent hard system freezes
for a
> > > > > long time, every 2 days or so, and finally decided to
> > > > > move to the most recent stable kernel. Unfortunatelly
> > > > > that didn't fix it. The freezes seem to be
io-related;
> > > > > they started since I moved my drive to sata. They seem
> > > > > to happen mostly when there's an io-intensive
operation,
> > > > > like extracting a big archive; a reboot is needed.
> > > > >
> > > > > Nothing ever shows up in the logs.
> > > > >
> > > > > I'm using the following drive:
> > > > >
> > > > > ATA device, with non-removable media
> > > > > Model Number: MAXTOR STM3160215AS

> > > > > Serial Number: 6RA2TMJ4
> > > > > Firmware Revision: 3.AAD
> > > > >
> > > > > Linux localhost 2.6.24.4 #4 SMP Fri Mar 28 14:51:50
BRT 2008 i686 GNU/Linux
> > > > >
> > > > > This is a 2-processor CPU:
> > > > >
> > > > > # cat /proc/cpuinfo
> > > > > processor : 0
> > > > > vendor_id : GenuineIntel
> > > > > cpu family : 15
> > > > > model : 4
> > > > > model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
> > > > > stepping : 9
> > > > > cpu MHz : 3013.697
> > > > > cache size : 1024 KB
> > > > >
> > > > > (same goes for processor : 1)
> > > > >
> > > > > Sincerely,
> > > > > --
> > > > > Luis Sousa


Veja quais são os assuntos do momento no Yahoo! +Buscados
http://br.maisbuscados.yahoo.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/