Re: XFS internal error (memory corruption)

From: Török Edwin
Date: Tue Jul 05 2011 - 09:38:19 EST


On 07/05/2011 04:09 PM, Dave Chinner wrote:
> On Tue, Jul 05, 2011 at 09:03:19AM +0300, Török Edwin wrote:
>> Hi,
>>
>> Yesterday when running 'shutdown -Pfh now', it hung using 99% CPU in sys [*]
>> Looking at the console there was a message about XFS "Corruption of in-memory data detected", and about XFS_WANT_CORRUPTED_GOTO.
>
> So you had a btree corruption.
>
>> Had to shutdown the machine via SysRQ u + o.
>>
>> Today when I booted I got this message:
>> [ 9.786494] XFS (md1p2): Mounting Filesystem
>> [ 9.927590] XFS (md1p2): Starting recovery (logdev: /dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5)
>> [ 10.385941] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1638 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff8122b80e
>> [ 10.385943]
>> [ 10.386007] Pid: 1990, comm: mount Not tainted 3.0.0-rc5 #155
>> [ 10.386009] Call Trace:
>> [ 10.386014] [<ffffffff812551ca>] xfs_error_report+0x3a/0x40
>> [ 10.386017] [<ffffffff8122b80e>] ? xfs_free_extent+0xce/0x120
>> [ 10.386019] [<ffffffff81227e06>] ? xfs_alloc_lookup_eq+0x16/0x20
>> [ 10.386021] [<ffffffff81228f4a>] xfs_free_ag_extent+0x6aa/0x780
>> [ 10.386023] [<ffffffff8122b80e>] xfs_free_extent+0xce/0x120
>> [ 10.386026] [<ffffffff8127b0ff>] ? kmem_zone_alloc+0x5f/0xe0
>> [ 10.386029] [<ffffffff81268e9f>] xlog_recover_process_efi+0x15f/0x1a0
>> [ 10.386031] [<ffffffff8126ab26>] xlog_recover_process_efis.isra.4+0x76/0xc0
>> [ 10.386033] [<ffffffff8126de62>] xlog_recover_finish+0x22/0xc0
>> [ 10.386035] [<ffffffff81265aa4>] xfs_log_mount_finish+0x24/0x30
>> [ 10.386038] [<ffffffff81270aab>] xfs_mountfs+0x45b/0x720
>> [ 10.386040] [<ffffffff81288741>] xfs_fs_fill_super+0x1f1/0x2e0
>> [ 10.386042] [<ffffffff811573aa>] mount_bdev+0x1aa/0x1f0
>> [ 10.386044] [<ffffffff81288550>] ? xfs_parseargs+0xb90/0xb90
>> [ 10.386046] [<ffffffff812866b0>] xfs_fs_mount+0x10/0x20
>> [ 10.386048] [<ffffffff81157c3e>] mount_fs+0x3e/0x1b0
>> [ 10.386051] [<ffffffff81171807>] vfs_kern_mount+0x57/0xa0
>> [ 10.386052] [<ffffffff81171c4f>] do_kern_mount+0x4f/0x100
>> [ 10.386054] [<ffffffff811732dc>] do_mount+0x19c/0x840
>> [ 10.386057] [<ffffffff8110fa12>] ? __get_free_pages+0x12/0x50
>> [ 10.386059] [<ffffffff81172fc5>] ? copy_mount_options+0x35/0x170
>> [ 10.386061] [<ffffffff81173d0b>] sys_mount+0x8b/0xe0
>> [ 10.386064] [<ffffffff814c19fb>] system_call_fastpath+0x16/0x1b
>> [ 10.386071] XFS (md1p2): Failed to recover EFIs
>> [ 10.386097] XFS (md1p2): log mount finish failed
>> [ 10.428562] XFS (md1p3): Mounting Filesystem
>> [ 10.609949] XFS (md1p3): Ending clean mount
>>
>> FWIW I got a message about EFIs yesterday too, but everything else worked:
>> Jul 4 09:42:54 debian kernel: [ 11.439861] XFS (md1p2): Mounting Filesystem
>> Jul 4 09:42:54 debian kernel: [ 11.599815] XFS (md1p2): Starting recovery (logdev: /dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5)
>> Jul 4 09:42:54 debian kernel: [ 11.787980] XFS (md1p2): I/O error occurred: meta-data dev md1p2 block 0x117925a8 ("xfs_trans_read_buf") error 5 buf c
>> ount 4096
>> Jul 4 09:42:54 debian kernel: [ 11.788044] XFS (md1p2): Failed to recover EFIs
>> Jul 4 09:42:54 debian kernel: [ 11.788065] XFS (md1p2): log mount finish failed
>> Jul 4 09:42:54 debian kernel: [ 11.831077] XFS (md1p3): Mounting Filesystem
>> Jul 4 09:42:54 debian kernel: [ 12.009647] XFS (md1p3): Ending clean mount
>
> Looks like you might have a dying disk. That's a IO error on read
> that has been reported back to XFS, and it warned that bad things
> happened. Maybe XFS should have shut down, though.

Ah I had /dev/sdd kicked out of the RAID array, and a lot of SATA EH resets.
Those were apparently caused by a loose SATA cable:
https://lkml.org/lkml/2011/7/5/22
Those errors were for /dev/sdd though, not /dev/sdi (the XFS log disk).

Once I re-plugged all the SATA cables, and re-added /dev/sdd to the RAID array, all seemed well:

Jul 4 09:59:22 debian kernel: [ 11.613073] XFS (md1p2): Mounting Filesystem
Jul 4 09:59:22 debian kernel: [ 11.782051] XFS (md1p2): Starting recovery (logdev: /dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5)
Jul 4 09:59:22 debian kernel: [ 12.020766] XFS (md1p2): Ending recovery (logdev: /dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5)
Jul 4 09:59:22 debian kernel: [ 12.031038] XFS (md1p3): Mounting Filesystem
Jul 4 09:59:22 debian kernel: [ 12.196238] XFS (md1p3): Ending clean mount

But when I shutdown the system later that day I got that log corruption error.
Could it be that the log / some data on the disk was still corrupted (despite the successful mount, and RAID resync), which caused the
log error later?

The log is not on the RAID array though, it is on a separate disk, and that disk is indeed older than all
the other disks. Still SMART doesn't show any errors, and I've run a SMART short self-test, and conveyance self-test, and those didn't log any errors either.


>> UUID=6f7c65b9-40b2-4b05-9157-522a67f65c4a /mnt/var_data xfs defaults,noatime,nodiratime,logbufs=8,logbsize=256k,logdev=/dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5 0 2
>>
>> I can't mount the FS anymore:
>> mount: Structure needs cleaning
>
> Obviously - you've got corrupted free space btrees thanks to the IO
> error during recovery and the later operations that were done on it.
> Now log recovery can't complete without hitting those corruptions.

Is the corruption in the log, or in the FS itself?
The FS itself could've been damaged when kicked out of the RAID.

>
>> So I used xfs_repair /dev/md1p2 -l /dev/sdi5 -L, and then I could mount the log.
>
> Did you replace the faulty disk? If not,this will just happen
> again...

Not yet, as I'm not sure if the problem was with the SATA cables
(and kicking out of sdd from the RAID), or the log disk itself.
And SMART shows no errors on the disk.

>
>> I did save the faulty log-file, let me know if you need it:
>> -rw-r--r-- 1 edwin edwin 2.9M Jul 5 09:00 sdi5.xz
>>
>> This is on a 3.0-rc5 kernel, my .config is below:
>>
>> I've run perf top with the hung shutdown, and it showed me something like this:
>> 1964.00 16.3% filemap_fdatawait_range /lib/modules/3.0.0-rc5/build/vmlinux
>> 1831.00 15.2% _raw_spin_lock /lib/modules/3.0.0-rc5/build/vmlinux
>> 1316.00 10.9% iput /lib/modules/3.0.0-rc5/build/vmlinux
>> 1265.00 10.5% _atomic_dec_and_lock /lib/modules/3.0.0-rc5/build/vmlinux
>> 998.00 8.3% _raw_spin_unlock /lib/modules/3.0.0-rc5/build/vmlinux
>> 731.00 6.1% sync_inodes_sb /lib/modules/3.0.0-rc5/build/vmlinux
>> 724.00 6.0% find_get_pages_tag /lib/modules/3.0.0-rc5/build/vmlinux
>> 580.00 4.8% radix_tree_gang_lookup_tag_slot /lib/modules/3.0.0-rc5/build/vmlinux
>> 525.00 4.3% __rcu_read_unlock /lib/modules/3.0.0-rc5/build/vmlinux
>
> Looks like it is running around trying to write back data, stuck
> somewhere in the code outside XFS. I haven't seen anything like this
> before.
>
> Still, the root cause is likely a bad disk or driver, so finding and
> fixing that is probably the first thing you should do...

OK, I'll see if this happens again, and I'll replace the disk if it does.

Although is there supposed to be a performance benefit from having a separate log disk with XFS?
IIRC it has a disadvantage that you can't use barriers properly.
But if I'd move the log to be stored with the FS (on the RAID10 array), will XFS be able to use barriers there,
or RAID still prevents barriers from working?

Thanks,
--Edwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/