Re: 2.6.32-rc5: surprise removal of USB mass storage, and wholesystem goes to hell

From: Jens Axboe
Date: Thu Oct 29 2009 - 04:38:26 EST


On Wed, Oct 28 2009, Jens Axboe wrote:
> On Wed, Oct 28 2009, Jens Axboe wrote:
> > On Wed, Oct 28 2009, Jiri Kosina wrote:
> > > On Tue, 27 Oct 2009, Pavel Machek wrote:
> > >
> > > > I did remove one harddrive w/o unmounting, and now the whole system
> > > > becomes unusable :-(: (whole dmesg attached).
> > > >
> > > > Stuff like "sync" hangs, and I'll probably have to reboot soon.
> > >
> > > From the traces it seems that it might be related to the new per-bdi
> > > writeback stuff ... adding Jens to CC.
> >
> > It looks like the IO isn't being errored on the device side, or perhaps
> > it just got stuck. Pavel, if you can reproduce, please try with this
> > tracing patch. Apply it, and then do something ala:
> >
> > # cd /sys/kernel/debug/tracing
> > # echo 0 events/enable
> > # echo 1 events/writeback/enable
> > # echo 0 > trace
> >
> > then start the act of reproducing, and finally
> >
> > # cat trace > /tmp/foo
> >
> > and send the output of foo here. Thanks!
>
> I can reproduce this. The writeback work gets queued, we notice the task
> isn't there and wake up the default task. And then nothing happens, I
> wonder if the bdi is gone.
>
> I'll fiddle around with this.

Problem is, we cannot control if the bdi disappears all of a sudden.
This happens when the device is yanked. This bug got introduced with the
addition of the sb s_bdi cache pointer, it would now point to a bdi that
was gone (and memory had been freed).

Pavel, can you try this?

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 4f53a6d..756c31b 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -614,6 +616,18 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
kthread_stop(wb->task);
}

+static void bdi_prune_sb(struct backing_dev_info *bdi)
+{
+ struct super_block *sb;
+
+ spin_lock(&sb_lock);
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ if (sb->s_bdi == bdi)
+ sb->s_bdi = NULL;
+ }
+ spin_unlock(&sb_lock);
+}
+
void bdi_unregister(struct backing_dev_info *bdi)
{
if (bdi->dev) {
@@ -624,6 +638,8 @@ void bdi_unregister(struct backing_dev_info *bdi)
device_unregister(bdi->dev);
bdi->dev = NULL;
}
+
+ bdi_prune_sb(bdi);
}
EXPORT_SYMBOL(bdi_unregister);


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/