SUMMARY: KVM+nf_conntrack_htable_size

From: Jon Masters
Date: Sun Jan 31 2010 - 04:10:53 EST


Folks,

Thanks to everyone who helped me poke a little at the netfilter code.
Since I'm not usually a network guy and haven't really kept up with
things like the namespace code, this was fun. I have some results.

The problem (as Eric hinted) is that we have a global
nf_conntrack_htable_size, which is manipulated every time we create a
new hashtable. This used to happen only once, but now that we have
multiple network namespaces, it will happen every time we create a new
namespace due to the code registering with register_pernet_subsys. At
this time, the value of the hashtable may be changed underneath code
that is currently using it for another hashtable instance. Additionally,
the "resize" code (via module parameter or via sysctl) only changes the
root netns hashtable, then changes this value, also busticating stuff.
Finally, the very fact that this variable is exported directly is
*asking* for trouble and random corruption to happen later on.

So, there are a great many issues with how the conntrack hashtables are
managed (looks literally as it it used to be fine then namespace code
came along and broke assumptions that had been there since the start),
and they should not be considered safe for use with multiple namespaces
in my opinion. The solution would seem to be to remove this as a global
and make it per namespace, or even per hashtable (there is usually a 1:1
mapping, but the expect code uses this variable too).

Can someone give me some advice on which solution you prefer of these?
Or if you have an alternative preference. I think for now I'll hack up
something involving per-namespace hashtable sizes.

Jon.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/