Re: XFS strangeness, xfs_db out of memory

From: Robin Rosenberg
Date: Sun Oct 24 2004 - 16:08:27 EST


On Sunday 24 October 2004 13.53, Jan Engelhardt wrote:
> >I was testing a tiny script on top of xfs_fsr to show fragmentation and
> > the resultss of defragmentation. As a result of fine tuning the output I
> > ran the script repeatedly and suddenly got error from find (unknown error
> > 999 if my memory serves me. It scrolled off the screen).
> >
> >The logs show this.
> >Oct 24 08:06:50 xine kernel: hda: dma_timer_expiry: dma status == 0x21
> >Oct 24 08:07:00 xine kernel: hda: DMA timeout error
> >Oct 24 08:07:00 xine kernel: hda: dma timeout error: status=0xd0 { Busy }
> >Oct 24 08:07:00 xine kernel:
> >Oct 24 08:07:00 xine kernel: hda: DMA disabled
> >Oct 24 08:07:00 xine kernel: ide0: reset: success
>
> Hi,
>
> That looks to me like your HD is going to die sometime in the future...
That's for certain. The question is if it's the near future. It's only a
couple of months old.

> >How bad is that for XFS?... The error isn't permanent it seems.
>
> Usually nothing. Expect <any fs> to struggle when such IO/DMA errors
> happen.
What I'm thinking about is if XFS ever saw the problem or if the kernel
retried the operation or what? I'm really curious as to what happened.

> >After that xfs_db -r /dev-with-home -c "frag -v" gives me an out-of-memory
> >error after a while, consistently.
>
> XFS has probably picked up a malicious value due to the disk error, and as
> such allocates that much. Probably more than you got.
Or these errors comes from previously unclean poweroffs (i.e. a hung system).

> >I ran the script repeatedly and suddenly got error from find (unknown
> > error 999 if my
>
> If you reboot, and restart this repeated test, does it always error out at
> the same time and spot (and with the same error 0x21/0x90), e.g. the 100'th
> instance of xfs_db?
>
> Please also try a badblocks -vv /dev/hdXY (or appropriate) repeatedly. If
> it finds something there after a lot of runs (at least as much as you
> needed to find out the fragmentation), there's definitely something wrong
> with the HD, not XFS.

I've tried it a few times, nothing so far. When I think again I have actually
seen this (or similar error) before. The logs only contains this instance of
the error, so it must be at least a month since int happended last.

-- robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/