Re: 2.6.22-rc3 hibernate(?) fails totally - regression (xfs on raid6)

From: David Greaves
Date: Tue Jun 12 2007 - 08:31:52 EST


[RESEND since I sent this late last friday and it's probably been buried by now.]

I had this as a PS, then I thought, we could all be wasting our time...

I don't like these "Section mismatch" warnings but that's because I'm paranoid
rather than because I know what they mean. I'll be happier when someone says
"That's OK, I know about them, they're not the problem"

WARNING: arch/i386/kernel/built-in.o(.text+0x968f): Section mismatch: reference
to .init.text: (between 'mtrr_bp_init' and 'mtrr_ap_init')
WARNING: arch/i386/kernel/built-in.o(.text+0x9781): Section mismatch: reference
to .init.text: (between 'mtrr_bp_init' and 'mtrr_ap_init')
WARNING: arch/i386/kernel/built-in.o(.text+0x9786): Section mismatch: reference
to .init.text: (between 'mtrr_bp_init' and 'mtrr_ap_init')
WARNING: arch/i386/kernel/built-in.o(.text+0xa25c): Section mismatch: reference
to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa303): Section mismatch: reference
to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa31b): Section mismatch: reference
to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/i386/kernel/built-in.o(.text+0xa344): Section mismatch: reference
to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/i386/kernel/built-in.o(.exit.text+0x19): Section mismatch:
reference to .init.text: (between 'cache_remove_dev' and 'powernow_k6_exit')
WARNING: arch/i386/kernel/built-in.o(.data+0x2160): Section mismatch: reference
to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mce_work')
WARNING: kernel/built-in.o(.text+0x14502): Section mismatch: reference to
.init.text: (between 'kthreadd' and 'init_waitqueue_head')

I'm paranoid because Andrew Morton said a couple of weeks ago:
Could the people who write these bugs, please, like, fix them?
It's not trivial noise. These things lead to kernel crashes.

Anyhow...

David Chinner wrote:
sync just guarantees that metadata changes are logged and data is
on disk - it doesn't stop the filesystem from doing anything after
the sync...
No, but there are no apps accessing the filesystem. It's just available for NFS
serving. Seems safer before potentially hanging the machine?


Also I made these changes to the kernel:
cu:/boot# diff config-2.6.22-rc4-TejuTst-dbg3-dirty
config-2.6.22-rc4-TejuTst-dbg1-dirty
3,4c3,4
< # Linux kernel version: 2.6.22-rc4-TejuTst-dbg3
< # Thu Jun 7 20:00:34 2007
---
# Linux kernel version: 2.6.22-rc4-TejuTst3
# Thu Jun 7 10:59:21 2007
242,244c242
< CONFIG_PM_DEBUG=y
< CONFIG_DISABLE_CONSOLE_SUSPEND=y
< # CONFIG_PM_TRACE is not set
---
# CONFIG_PM_DEBUG is not set

positive: I can now get sysrq-t :)
negative: if I build skge into the kernel the behaviour changes so I can't run
netconsole

Just to be sure I tested and this kernel suspends/restores with /huge unmounted.
It also hangs without an umount so the behaviour is the same.

Ok, so a clean inode is sufficient to prevent hibernate from working.

So, what's different between a sync and a remount?

do_remount_sb() does:

599 shrink_dcache_sb(sb);
600 fsync_super(sb);

of which a sync does neither. sync does what fsync_super() does in
different sort of way, but does not call sync_blockdev() on each
block device. It looks like that is the two main differences between
sync and remount - remount trims the dentry cache and syncs the blockdev,
sync doesn't.

What about freezing the filesystem?
cu:~# xfs_freeze -f /huge
cu:~# /usr/net/bin/hibernate
[but this doesn't even hibernate - same as the 'touch']

I suspect that the frozen filesystem might cause other problems
in the hibernate process. However, while a freeze calls sync_blockdev()
it does not trim the dentry cache.....

So, rather than a remount before hibernate, lets see if we can remove the dentries some other way to determine if removing excess
dentries/inodes from the caches makes a difference. Can you do:

# touch /huge/foo
# sync
# echo 1 > /proc/sys/vm/drop_caches
# hibernate
success

# touch /huge/bar
# sync
# echo 2 > /proc/sys/vm/drop_caches
# hibernate
success

# touch /huge/baz
# sync
# echo 3 > /proc/sys/vm/drop_caches
# hibernate
success

So I added
# touch /huge/bork
# sync
# hibernate

And it still succeeded - sigh.

So I thought a bit and did:
rm /huge/b* /huge/foo

Clean boot
# touch /huge/bar
# sync
# echo 2 > /proc/sys/vm/drop_caches
# hibernate
hangs on suspend (sysrq-b doesn't work)

Clean boot
# touch /huge/baz
# sync
# echo 3 > /proc/sys/vm/drop_caches
# hibernate
hangs on suspend (sysrq-b doesn't work)

So I rebooted and hibernated to make sure I'm not having random behaviour - yep,
hang on resume (as per usual).

Now I wonder if any other mounts have an effect...
reboot and umount /dev/hdb2 xfs fs, - hang on hibernate


I'm confused. I'm going to order chinese takeaway and then find a serial cable...

David
PS 2.6.21.1 works fine.
PPS the takeaway was nice.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/