RE: [PATCH net-next 11/14] vsock: add multi-transports support

From: Jorgen Hansen
Date: Mon Nov 11 2019 - 08:53:48 EST


> From: Stefano Garzarella [mailto:sgarzare@xxxxxxxxxx]
> Sent: Wednesday, October 23, 2019 11:56 AM

Thanks a lot for working on this!

> With the multi-transports support, we can use vsock with nested VMs (using
> also different hypervisors) loading both guest->host and
> host->guest transports at the same time.
>
> Major changes:
> - vsock core module can be loaded regardless of the transports
> - vsock_core_init() and vsock_core_exit() are renamed to
> vsock_core_register() and vsock_core_unregister()
> - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
> to identify which directions the transport can handle and if it's
> support DGRAM (only vmci)
> - each stream socket is assigned to a transport when the remote CID
> is set (during the connect() or when we receive a connection request
> on a listener socket).

How about allowing the transport to be set during bind as well? That
would allow an application to ensure that it is using a specific transport,
i.e., if it binds to the host CID, it will use H2G, and if it binds to something
else it will use G2H? You can still use VMADDR_CID_ANY if you want to
initially listen to both transports.


> The remote CID is used to decide which transport to use:
> - remote CID > VMADDR_CID_HOST will use host->guest transport
> - remote CID <= VMADDR_CID_HOST will use guest->host transport
> - listener sockets are not bound to any transports since no transport
> operations are done on it. In this way we can create a listener
> socket, also if the transports are not loaded or with VMADDR_CID_ANY
> to listen on all transports.
> - DGRAM sockets are handled as before, since only the vmci_transport
> provides this feature.
>
> Signed-off-by: Stefano Garzarella <sgarzare@xxxxxxxxxx>
> ---
> RFC -> v1:
> - documented VSOCK_TRANSPORT_F_* flags
> - fixed vsock_assign_transport() when the socket is already assigned
> (e.g connection failed)
> - moved features outside of struct vsock_transport, and used as
> parameter of vsock_core_register()
> ---
> drivers/vhost/vsock.c | 5 +-
> include/net/af_vsock.h | 17 +-
> net/vmw_vsock/af_vsock.c | 237 ++++++++++++++++++------
> net/vmw_vsock/hyperv_transport.c | 26 ++-
> net/vmw_vsock/virtio_transport.c | 7 +-
> net/vmw_vsock/virtio_transport_common.c | 28 ++-
> net/vmw_vsock/vmci_transport.c | 31 +++-
> 7 files changed, 270 insertions(+), 81 deletions(-)
>


> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index
> d89381166028..dddd85d9a147 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -130,7 +130,12 @@ static struct proto vsock_proto = { #define
> VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256) #define
> VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
>
> -static const struct vsock_transport *transport_single;
> +/* Transport used for host->guest communication */ static const struct
> +vsock_transport *transport_h2g;
> +/* Transport used for guest->host communication */ static const struct
> +vsock_transport *transport_g2h;
> +/* Transport used for DGRAM communication */ static const struct
> +vsock_transport *transport_dgram;
> static DEFINE_MUTEX(vsock_register_mutex);
>
> /**** UTILS ****/
> @@ -182,7 +187,7 @@ static int vsock_auto_bind(struct vsock_sock *vsk)
> return __vsock_bind(sk, &local_addr);
> }
>
> -static int __init vsock_init_tables(void)
> +static void vsock_init_tables(void)
> {
> int i;
>
> @@ -191,7 +196,6 @@ static int __init vsock_init_tables(void)
>
> for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
> INIT_LIST_HEAD(&vsock_connected_table[i]);
> - return 0;
> }
>
> static void __vsock_insert_bound(struct list_head *list, @@ -376,6 +380,62
> @@ void vsock_enqueue_accept(struct sock *listener, struct sock
> *connected) } EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
>
> +/* Assign a transport to a socket and call the .init transport callback.
> + *
> + * Note: for stream socket this must be called when vsk->remote_addr is
> +set
> + * (e.g. during the connect() or when a connection request on a
> +listener
> + * socket is received).
> + * The vsk->remote_addr is used to decide which transport to use:
> + * - remote CID > VMADDR_CID_HOST will use host->guest transport
> + * - remote CID <= VMADDR_CID_HOST will use guest->host transport */
> +int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock
> +*psk) {
> + const struct vsock_transport *new_transport;
> + struct sock *sk = sk_vsock(vsk);
> +
> + switch (sk->sk_type) {
> + case SOCK_DGRAM:
> + new_transport = transport_dgram;
> + break;
> + case SOCK_STREAM:
> + if (vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
> + new_transport = transport_h2g;
> + else
> + new_transport = transport_g2h;
> + break;

You already mentioned that you are working on a fix for loopback
here for the guest, but presumably a host could also do loopback.
If we select transport during bind to a specific CID, this comment
Isn't relevant, but otherwise, we should look at the local addr as
well, since a socket with local addr of host CID shouldn't use
the guest to host transport, and a socket with local addr > host CID
shouldn't use host to guest.


> + default:
> + return -ESOCKTNOSUPPORT;
> + }
> +
> + if (vsk->transport) {
> + if (vsk->transport == new_transport)
> + return 0;
> +
> + vsk->transport->release(vsk);
> + vsk->transport->destruct(vsk);
> + }
> +
> + if (!new_transport)
> + return -ENODEV;
> +
> + vsk->transport = new_transport;
> +
> + return vsk->transport->init(vsk, psk); }
> +EXPORT_SYMBOL_GPL(vsock_assign_transport);
> +
> +static bool vsock_find_cid(unsigned int cid) {
> + if (transport_g2h && cid == transport_g2h->get_local_cid())
> + return true;
> +
> + if (transport_h2g && cid == VMADDR_CID_HOST)
> + return true;
> +
> + return false;
> +}
> +
> static struct sock *vsock_dequeue_accept(struct sock *listener) {
> struct vsock_sock *vlistener;


> diff --git a/net/vmw_vsock/vmci_transport.c
> b/net/vmw_vsock/vmci_transport.c index 5955238ffc13..2eb3f16d53e7
> 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c

> @@ -1017,6 +1018,15 @@ static int vmci_transport_recv_listen(struct sock
> *sk,
> vsock_addr_init(&vpending->remote_addr, pkt->dg.src.context,
> pkt->src_port);
>
> + err = vsock_assign_transport(vpending, vsock_sk(sk));
> + /* Transport assigned (looking at remote_addr) must be the same
> + * where we received the request.
> + */
> + if (err || !vmci_check_transport(vpending)) {

We need to send a reset on error, i.e.,
vmci_transport_send_reset(sk, pkt);

> + sock_put(pending);
> + return err;
> + }
> +
> /* If the proposed size fits within our min/max, accept it. Otherwise
> * propose our own size.
> */

Thanks,
Jorgen