Re: [RFC PATCH net-next v2 0/5] netns: allow to identify peer netns

From: Nicolas Dichtel
Date: Thu Oct 02 2014 - 09:46:23 EST


Le 29/09/2014 20:43, Eric W. Biederman a Ãcrit :
Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> writes:

Le 26/09/2014 20:57, Eric W. Biederman a Ãcrit :
Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:

On Fri, Sep 26, 2014 at 11:10 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> writes:

The goal of this serie is to be able to multicast netlink messages with an
attribute that identify a peer netns.
This is needed by the userland to interpret some informations contained in
netlink messages (like IFLA_LINK value, but also some other attributes in case
of x-netns netdevice (see also
http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).

I want say that the problem addressed by patch 3/5 of this series is a
fundamentally valid problem. We have network objects spanning network
namespaces and it would be very nice to be able to talk about them in
netlink, and file descriptors are too local and argubably too heavy
weight for netlink quires and especially for netlink broadcast messages.

Furthermore the concept of ineternal concept of peernet2id seems valid.

However what you do not address is a way for CRIU (aka process
migration) to be able to restore these ids after process migration.
Going farther it looks like you are actively breaking process migration
at this time, making this set of patches a no-go.
Ok, I will look more deeply into CRIU.


When adding a new form of namespace id CRIU patches are just about
as necessary as iproute patches.
Noted.



That does not describe what you have actually implemented in the
patches.

I see two ways to go with this.

- A per network namespace table to that you can store ids for ``peer''
network namespaces. The table would need to be populated manually by
the likes of ip netns add.

That flips the order of assignment and makes this idea solid.
I have a preference for this solution, because it allows to have a full
broadcast messages. When you have a lot of network interfaces (> 10k),
it saves a lot of time to avoid another request to get all informations.

My practical question is how often does it happen that we care?
In fact, I don't think that scenarii with a lot of netns have a full mesh of
x-netns interfaces. It will be more one "link" netns with the physical
interface and all other with one interface with the link part in this "link"
netns. Hence, only one nsid is needing in each netns.


Unfortunately in the case of a fully referencing mesh of N network
namespaces such a mesh winds up taking O(N^2) space, which seems
undesirable.
Memory consumption vs performances ;-)
In fact, when you have a lot of netns, you already should have some memory
available (at least N lo interfaces + N interfaces (veth or a x-netns
interface)). I'm not convinced that this is really an obstacle.

I would have to see how it all fits together. O(N^2) grows a lot faster
that N. So after a point it isn't in the same ballpark of memory
consumption.

broadcast message business, and only care about the remote namespace for
unicast messages. Putting the work in an infrequently used slow path
instead of a comparitively common path gives us much more freedom in
the implementation.
I think it's better to have a full netlink messages, instead a partial one.
There is already a lot of attributes added for each rtnl interface messages to
be sure to describe all parameters of these interfaces.
And if the user don't care about ids (user has not set any id with iproute2),
we can just add the same attribute with id 0 (let's say it's a reserved id) to
indicate that the link part of this interface is in another netns.

I imagine an id like that is something we would want ip netns add to
set, and probably set in all existing network namespaces as well.

The great benefit of your first proposal is that the ids are set by the
userspace and thus it allows a high flexibility.

Would you accept a patch that implements this first solution?

I would not fundamentally reject it. I would really like to make
certain we think through how it will be used and what the practical
benefits are. Depending on how it is used the data structure could
be a killer or it could be a case where we see how to manage it and
simply don't care.
I will send a v3, so we can talk about it.


Thank you,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/