Lockups using per-thread cgs and kvm

From: Andrey Korolyov
Date: Tue Feb 12 2013 - 05:26:44 EST


We (a cloud hosting provider) has recently observed a couple of
strange lockups when physical node runs significant amount of
Win2008R2 kvm appliances, one may see collection of those lockups at
the link below. After checking a lot of ideas without any valuable
result, I have suggested that nested per-thread cgroup placement
created by libvirt may lead to this problem(libvirt puts emulator and
each of vcpu threads into separate sub-cgroup). Disabling such
behavior, e.g. having only one cgroup per kvm process per cgroup type
solved this problem, at least it didn`t happen on most stressful tests
we`re able to do. Since it is generally unusual for well-known kernel
mechanism, such as cgroups, to broke way like this, I hope we`ve found
a quite rare kind of bug. Just for the record, the bug also may happen
using linux guest, but much rarely, one or two orders of magnitude. We
have stayed on default scheduler granularity value at this tests, if
it matters.

For anyone who wants to see entire timeline of this bug, please see [1].

[1]. http://www.spinics.net/lists/kvm/msg85956.html
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/