Re: 2.6.34 echo j > /proc/sysrq-trigger causes inifniteunfreeze/Thaw event

From: Dave Chinner
Date: Sun Jun 06 2010 - 21:06:21 EST


On Thu, Jun 03, 2010 at 11:30:30PM -0600, Jeffrey Merkey wrote:
> causes the FS Thaw stuff in fs/buffer.c to enter an infinite loop
> filling the /var/log/messages with junk and causing the hard drive to
> crank away endlessly.

Hmmm, looks pretty obvious what the 2.6.34 bug is:

while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
printk(KERN_WARNING "Emergency Thaw on %s\n",
bdevname(sb->s_bdev, b));

thaw_bdev() returns 0 on success or not frozen, and returns non-zero
only if the unfreeze failed. Looks like it was broken from the start
to me.

Fixing that endless loop shows some other problems on 2.6.35,
though: the emergency unfreeze is not unfreezing frozen XFS
filesystems. This appears to be caused by
18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2 ("Introduce freeze_super
and thaw_super for the fsfreeze ioctl").

It appears that this introduces a significant mismatch between the
bdev freeze/thaw and the super freze/thaw. That is, if you freeze
with the sb method, you can only unfreeze via the sb method.
however, if you freeze via the bdev method, you can unfreeze by
either the bdev or sb method. This breaks the nesting of the
freeze/thaw operations between dm and userspace, which can lead to
premature thawing of the filesystem.

Then there is this deadlock:

iterate_supers(do_thaw_one) does:

down_read(&sb->s_umount);
do_thaw_one(sb)
thaw_bdev(sb->s_bdev, sb))
thaw_super(sb)
down_write(&sb->s_umount);

Which is an instant deadlock.

These problems were hidden by the fact that the emergency thaw code
was not getting past the thaw_bdev guards and so not triggering
this deadlock.

Al, Josef, what's the best way to fix this mess?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/