Re: kvm deadlock

From: Avi Kivity
Date: Wed Dec 14 2011 - 08:43:23 EST


On 12/14/2011 02:25 PM, Marcelo Tosatti wrote:
> On Mon, Dec 05, 2011 at 04:48:16PM -0600, Nate Custer wrote:
> > Hello,
> >
> > I am struggling with repeatable full hardware locks when running 8-12 KVM vms. At some point before the hard lock I get a inconsistent lock state warning. An example of this can be found here:
> >
> > http://pastebin.com/8wKhgE2C
> >
> > After that the server continues to run for a while and then starts its death spiral. When it reaches that point it fails to log anything further to the disk, but by attaching a console I have been able to get a stack trace documenting the final implosion:
> >
> > http://pastebin.com/PbcN76bd
> >
> > All of the cores end up hung and the server stops responding to all input, including SysRq commands.
> >
> > I have seen this behavior on two machines (dual E5606 running Fedora 16) both passed cpuburnin testing and memtest86 scans without error.
> >
> > I have reproduced the crash and stack traces from a Fedora debugging kernel - 3.1.2-1 and with a vanilla 3.1.4 kernel.
>
> Busted hardware, apparently. Can you reproduce these issues with the
> same workload on different hardware?

I don't think it's hardware related. The second trace (in the first
paste) is called during swap, so GFP_FS is set. The first one is not,
so GFP_FS is clear. Lockdep is worried about the following scenario:

acpi_early_init() is called
calls pcpu_alloc(), which takes pcpu_alloc_mutex
eventually, calls kmalloc(), or some other allocation function
no memory, so swap
call try_to_free_pages()
submit_bio()
blk_throtl_bio()
blkio_alloc_blkg_stats()
alloc_percpu()
pcpu_alloc(), which takes pcpu_alloc_mutex
deadlock

It's a little unlikely that acpi_early_init() will OOM, but lockdep
doesn't know that. Other callers of pcpu_alloc() could trigger the same
thing.

When lockdep says

[ 5839.924953] other info that might help us debug this:
[ 5839.925396] Possible unsafe locking scenario:
[ 5839.925397]
[ 5839.925840] CPU0
[ 5839.926063] ----
[ 5839.926287] lock(pcpu_alloc_mutex);
[ 5839.926533] <Interrupt>
[ 5839.926756] lock(pcpu_alloc_mutex);
[ 5839.926986]

It really means

<swap, set GFP_FS>

GFP_FS simply marks the beginning of a nested, unrelated context that
uses the same thread, just like an interrupt. Kudos to lockdep for
catching that.

I think the allocation in blkio_alloc_blkg_stats() should be moved out
of the I/O path into some init function. Copying Jens.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/