Re: MMC: fix hang if card was removed during suspend and unsafe resume was enabled

From: Eric Miao
Date: Tue Apr 13 2010 - 17:20:14 EST


On Feb 6, 4:00 am, Maxim Levitsky <maximlevit...@xxxxxxxxx> wrote:
> On Fri, 2010-02-05 at 10:26 -0800, Andrew Morton wrote:
> > On Fri, 05 Feb 2010 17:52:00 +0200
> > Maxim Levitsky <maximlevit...@xxxxxxxxx> wrote:
>
> > > > > <4>[15241.042047]  [<ffffffff8106620a>] ? prepare_to_wait+0x2a/0x90
> > > > > <4>[15241.042159]  [<ffffffff810790bd>] ? trace_hardirqs_on+0xd/0x10
> > > > > <4>[15241.042271]  [<ffffffff8140db12>] ? _raw_spin_unlock_irqrestore+0x42/0x80
> > > > > <4>[15241.042386]  [<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > > > <4>[15241.042496]  [<ffffffff8112a39e>] bdi_sched_wait+0xe/0x20
> > > > > <4>[15241.042606]  [<ffffffff8140af6f>] __wait_on_bit+0x5f/0x90
> > > > > <4>[15241.042714]  [<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
> > > > > <4>[15241.042824]  [<ffffffff8140b018>] out_of_line_wait_on_bit+0x78/0x90
> > > > > <4>[15241.042935]  [<ffffffff81065fd0>] ? wake_bit_function+0x0/0x40
> > > > > <4>[15241.043045]  [<ffffffff8112a2d3>] ? bdi_queue_work+0xa3/0xe0
> > > > > <4>[15241.043155]  [<ffffffff8112a37f>] bdi_sync_writeback+0x6f/0x80
> > > > > <4>[15241.043265]  [<ffffffff8112a3d2>] sync_inodes_sb+0x22/0x120
> > > > > <4>[15241.043375]  [<ffffffff8112f1d2>] __sync_filesystem+0x82/0x90
> > > > > <4>[15241.043485]  [<ffffffff8112f3db>] sync_filesystem+0x4b/0x70
> > > > > <4>[15241.043594]  [<ffffffff811391de>] fsync_bdev+0x2e/0x60
> > > > > <4>[15241.043704]  [<ffffffff812226be>] invalidate_partition+0x2e/0x50
> > > > > <4>[15241.043816]  [<ffffffff8116b92f>] del_gendisk+0x3f/0x140
> > > > > <4>[15241.043926]  [<ffffffffa00c0233>] mmc_blk_remove+0x33/0x60 [mmc_block]
> > > > > <4>[15241.044043]  [<ffffffff81338977>] mmc_bus_remove+0x17/0x20
> > > > > <4>[15241.044152]  [<ffffffff812ce746>] __device_release_driver+0x66/0xc0
> > > > > <4>[15241.044264]  [<ffffffff812ce89d>] device_release_driver+0x2d/0x40
> > > > > <4>[15241.044375]  [<ffffffff812cd9b5>] bus_remove_device+0xb5/0x120
> > > > > <4>[15241.044486]  [<ffffffff812cb46f>] device_del+0x12f/0x1a0
> > > > > <4>[15241.044593]  [<ffffffff81338a5b>] mmc_remove_card+0x5b/0x90
> > > > > <4>[15241.044702]  [<ffffffff8133ac27>] mmc_sd_remove+0x27/0x50
> > > > > <4>[15241.044811]  [<ffffffff81337d8c>] mmc_resume_host+0x10c/0x140
> > > > > <4>[15241.044929]  [<ffffffffa00850e9>] sdhci_resume_host+0x69/0xa0 [sdhci]
> > > > > <4>[15241.045044]  [<ffffffffa0bdc39e>] sdhci_pci_resume+0x8e/0xb0 [sdhci_pci]
>
> > > > So what's the hang?  del_gendisk is doing IO?  I'd assumed that it was
> > > > because it was calling kobject_uevent, but userspace is frozen.
>
> > > This is a backtrace of a hang.
>
> > But why did it hang?  Because the BDI worker threads are trying to
> > perform IO through a suspended device?
>
> Something like that I guess.
> Also this is 100% reproducible, and I can reproduce this with my own
> driver too (by making the card detection workqueue be non freezable)
>

It looks to me bdi is waiting for writeback task to finish, yet the
processes
are frozen, so this never happens, and hang.

And I can confirm this always happens. Without MMC_UNSAFE_RESUME,
this happens when suspending where the mmc core tries to remove the
card.
With MMC_UNSAFE_RESUME, this happens when resume if the card removed
during suspend.

Though the root cause looks to me lies in the del_gendisk() not safe
to be
called within suspend context, and a clean fix might be somewhere in
the
generic disk layer. Skip removing card during suspend, IMHO, might not
be
a clean enough fix to this problem.

I might be able to avoid this issue by removing the card within user
space
pm scripts, but that's a shame if this cannot be cleanly fixed within
kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/