Re: [PATCH v2 7/7] Sample Implementation of Intel MIC User SpaceDaemon.

From: Michael S. Tsirkin
Date: Thu Aug 08 2013 - 02:39:30 EST


On Wed, Aug 07, 2013 at 08:04:13PM -0700, Sudeep Dutt wrote:
> From: Caz Yokoyama <Caz.Yokoyama@xxxxxxxxx>
>
> This patch introduces a sample user space daemon which
> implements the virtio device backends on the host. The daemon
> creates/removes/configures virtio device backends by communicating with
> the Intel MIC Host Driver. The virtio devices currently supported are
> virtio net, virtio console and virtio block. Virtio net supports TSO/GSO.
> The daemon also monitors card shutdown status and takes appropriate actions
> like killing the virtio backends and resetting the card upon card shutdown
> and crashes.
>
> Co-author: Ashutosh Dixit <ashutosh.dixit@xxxxxxxxx>
> Co-author: Sudeep Dutt <sudeep.dutt@xxxxxxxxx>
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@xxxxxxxxx>
> Signed-off-by: Caz Yokoyama <Caz.Yokoyama@xxxxxxxxx>
> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@xxxxxxxxx>
> Signed-off-by: Nikhil Rao <nikhil.rao@xxxxxxxxx>
> Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@xxxxxxxxx>
> Signed-off-by: Sudeep Dutt <sudeep.dutt@xxxxxxxxx>
> Acked-by: Yaozu (Eddie) Dong <eddie.dong@xxxxxxxxx>
> ---
> Documentation/mic/mic_overview.txt | 48 +
> Documentation/mic/mpssd/.gitignore | 1 +
> Documentation/mic/mpssd/Makefile | 19 +
> Documentation/mic/mpssd/micctrl | 152 ++++
> Documentation/mic/mpssd/mpss | 245 ++++++
> Documentation/mic/mpssd/mpssd.c | 1689 ++++++++++++++++++++++++++++++++++++
> Documentation/mic/mpssd/mpssd.h | 100 +++
> Documentation/mic/mpssd/sysfs.c | 103 +++

Is this generally useful or just example code?
If the former, you can put it in tools/ as well.

> 8 files changed, 2357 insertions(+)
> create mode 100644 Documentation/mic/mic_overview.txt
> create mode 100644 Documentation/mic/mpssd/.gitignore
> create mode 100644 Documentation/mic/mpssd/Makefile
> create mode 100755 Documentation/mic/mpssd/micctrl
> create mode 100755 Documentation/mic/mpssd/mpss
> create mode 100644 Documentation/mic/mpssd/mpssd.c
> create mode 100644 Documentation/mic/mpssd/mpssd.h
> create mode 100644 Documentation/mic/mpssd/sysfs.c
>
> diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt
> new file mode 100644
> index 0000000..8b1a916
> --- /dev/null
> +++ b/Documentation/mic/mic_overview.txt
> @@ -0,0 +1,48 @@
> +An Intel MIC X100 device is a PCIe form factor add-in coprocessor
> +card based on the Intel Many Integrated Core (MIC) architecture
> +that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
> +implements the three required standard address spaces i.e. configuration,
> +memory and I/O. The host OS loads a device driver as is typical for
> +PCIe devices. The card itself runs a bootstrap after reset that
> +transfers control to the card OS downloaded from the host driver.
> +The card OS as shipped by Intel is a Linux kernel with modifications
> +for the X100 devices.
> +
> +Since it is a PCIe card, it does not have the ability to host hardware
> +devices for networking, storage and console. We provide these devices
> +on X100 coprocessors thus enabling a self-bootable equivalent environment
> +for applications. A key benefit of our solution is that it leverages
> +the standard virtio framework for network, disk and console devices,
> +though in our case the virtio framework is used across a PCIe bus.
> +
> +Here is a block diagram of the various components described above. The
> +virtio backends are situated on the host rather than the card given better
> +single threaded performance for the host compared to MIC and the ability of
> +the host to initiate DMA's to/from the card using the MIC DMA engine.
> +
> + |
> + +----------+ | +----------+
> + | Card OS | | | Host OS |
> + +----------+ | +----------+
> + |
> ++-------+ +--------+ +------+ | +---------+ +--------+ +--------+
> +| Virtio| |Virtio | |Virtio| | |Virtio | |Virtio | |Virtio |
> +| Net | |Console | |Block | | |Net | |Console | |Block |
> +| Driver| |Driver | |Driver| | |backend | |backend | |backend |
> ++-------+ +--------+ +------+ | +---------+ +--------+ +--------+
> + | | | | | | |
> + | | | |Ring 3| | |
> + | | | |------|------------|---------|-------
> + +-------------------+ |Ring 0+--------------------------+
> + | | | Virtio over PCIe IOCTLs |
> + | | +--------------------------+
> + +--------------+ | |
> + |Intel MIC | | +---------------+
> + |Card Driver | | |Intel MIC |
> + +--------------+ | |Host Driver |
> + | | +---------------+
> + | | |
> + +-------------------------------------------------------------+
> + | |
> + | PCIe Bus |
> + +-------------------------------------------------------------+
> diff --git a/Documentation/mic/mpssd/.gitignore b/Documentation/mic/mpssd/.gitignore
> new file mode 100644
> index 0000000..8b7c72f
> --- /dev/null
> +++ b/Documentation/mic/mpssd/.gitignore
> @@ -0,0 +1 @@
> +mpssd
> diff --git a/Documentation/mic/mpssd/Makefile b/Documentation/mic/mpssd/Makefile
> new file mode 100644
> index 0000000..eb860a7
> --- /dev/null
> +++ b/Documentation/mic/mpssd/Makefile
> @@ -0,0 +1,19 @@
> +#
> +# Makefile - Intel MIC User Space Tools.
> +# Copyright(c) 2013, Intel Corporation.
> +#
> +ifdef DEBUG
> +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall -DDEBUG=$(DEBUG)
> +else
> +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall
> +endif
> +
> +mpssd: mpssd.o sysfs.o
> + $(CC) $(CFLAGS) -o $@ $^ -lpthread
> +
> +install:
> + install mpssd /usr/sbin/mpssd
> + install micctrl /usr/sbin/micctrl
> +
> +clean:
> + rm -f mpssd *.o
> diff --git a/Documentation/mic/mpssd/micctrl b/Documentation/mic/mpssd/micctrl
> new file mode 100755
> index 0000000..e0cfa53
> --- /dev/null
> +++ b/Documentation/mic/mpssd/micctrl
> @@ -0,0 +1,152 @@
> +#!/bin/bash
> +# Intel MIC Platform Software Stack (MPSS)
> +#
> +# Copyright(c) 2013 Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License, version 2, as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Intel MIC User Space Tools.
> +#
> +# micctrl - Controls MIC boot/start/stop.
> +#
> +# chkconfig: 2345 95 05
> +# description: start MPSS stack processing.
> +#
> +### BEGIN INIT INFO
> +# Provides: micctrl
> +### END INIT INFO
> +
> +# Source function library.
> +. /etc/init.d/functions
> +
> +sysfs="/sys/class/mic"
> +
> +status()
> +{
> + if [ "`echo $1 | head -c3`" == "mic" ]; then
> + f=$sysfs/$1
> + echo -e $1 state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`"
> + return 0
> + fi
> +
> + if [ -d "$sysfs" ]; then
> + for f in $sysfs/*
> + do
> + echo -e ""`basename $f`" state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`""
> + done
> + fi
> +
> + return 0
> +}
> +
> +reset()
> +{
> + if [ "`echo $1 | head -c3`" == "mic" ]; then
> + f=$sysfs/$1
> + echo reset > $f/state
> + return 0
> + fi
> +
> + if [ -d "$sysfs" ]; then
> + for f in $sysfs/*
> + do
> + echo reset > $f/state
> + done
> + fi
> +
> + return 0
> +}
> +
> +boot()
> +{
> + if [ "`echo $1 | head -c3`" == "mic" ]; then
> + f=$sysfs/$1
> + echo "boot:linux:mic/uos.img:mic/$1.image" > $f/state
> + return 0
> + fi
> +
> + if [ -d "$sysfs" ]; then
> + for f in $sysfs/*
> + do
> + echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> + done
> + fi
> +
> + return 0
> +}
> +
> +shutdown()
> +{
> + if [ "`echo $1 | head -c3`" == "mic" ]; then
> + f=$sysfs/$1
> + echo shutdown > $f/state
> + return 0
> + fi
> +
> + if [ -d "$sysfs" ]; then
> + for f in $sysfs/*
> + do
> + echo shutdown > $f/state
> + done
> + fi
> +
> + return 0
> +}
> +
> +wait()
> +{
> + if [ "`echo $1 | head -c3`" == "mic" ]; then
> + f=$sysfs/$1
> + while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> + do
> + sleep 1
> + echo -e "Waiting for $1 to go offline"
> + done
> + return 0
> + fi
> +
> + if [ -d "$sysfs" ]; then
> + # Wait for the cards to go offline
> + for f in $sysfs/*
> + do
> + while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> + do
> + sleep 1
> + echo -e "Waiting for "`basename $f`" to go offline"
> + done
> + done
> + fi
> +}
> +
> +case $1 in
> + -s)
> + status $2
> + ;;
> + -r)
> + reset $2
> + ;;
> + -b)
> + boot $2
> + ;;
> + -S)
> + shutdown $2
> + ;;
> + -w)
> + wait $2
> + ;;
> + *)
> + echo $"Usage: $0 {-s (status) |-r (reset) |-b (boot) |-S (shutdown) |-w (wait)}"
> + exit 2
> +esac
> +
> +exit $?
> diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss
> new file mode 100755
> index 0000000..f0bb3dd
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpss
> @@ -0,0 +1,245 @@
> +#!/bin/bash
> +# Intel MIC Platform Software Stack (MPSS)
> +#
> +# Copyright(c) 2013 Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License, version 2, as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Intel MIC User Space Tools.
> +#
> +# mpss Start mpssd.
> +#
> +# chkconfig: 2345 95 05
> +# description: start MPSS stack processing.
> +#
> +### BEGIN INIT INFO
> +# Provides: mpss
> +# Required-Start:
> +# Required-Stop:
> +# Short-Description: MPSS stack control
> +# Description: MPSS stack control
> +### END INIT INFO
> +
> +# Source function library.
> +. /etc/init.d/functions
> +
> +exec=/usr/sbin/mpssd
> +sysfs="/sys/class/mic"
> +
> +start()
> +{
> + [ -x $exec ] || exit 5
> +
> + echo -e $"Starting MPSS Stack"
> +
> + echo -e $"Loading MIC_HOST Module"
> +
> + # Ensure the driver is loaded
> + [ -d "$sysfs" ] || modprobe mic_host
> +
> + if [ "`ps -e | awk '{print $4}' | grep mpssd | head -1`" = "mpssd" ]; then
> + echo -e $"MPSSD already running! "
> + success
> + echo
> + return 0;
> + fi
> +
> + # Start the daemon
> + echo -n $"Starting MPSSD"
> + $exec &
> + RETVAL=$?
> + if [ $RETVAL -ne 0 ]; then
> + failure
> + else
> + success
> + fi
> + echo
> +
> + sleep 5
> +
> + # Boot the cards
> + if [ $RETVAL -eq 0 ]; then
> + for f in $sysfs/*
> + do
> + echo -ne "Booting "`basename $f`" "
> + echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> + RETVAL=$?
> + if [ $RETVAL -ne 0 ]; then
> + failure
> + else
> + success
> + fi
> + echo
> + done
> + fi
> +
> + # Wait till ping works
> + if [ $RETVAL -eq 0 ]; then
> + for f in $sysfs/*
> + do
> + count=100
> + ipaddr=`cat $f/cmdline`
> + ipaddr=${ipaddr#*address,}
> + ipaddr=`echo $ipaddr | cut -d, -f1 | cut -d\; -f1`
> +
> + while [ $count -ge 0 ]
> + do
> + echo -e "Pinging "`basename $f`" "
> + ping -c 1 $ipaddr &> /dev/null
> + RETVAL=$?
> + if [ $RETVAL -eq 0 ]; then
> + success
> + break
> + fi
> + sleep 1
> + count=`expr $count - 1`
> + done
> + if [ $RETVAL -ne 0 ]; then
> + failure
> + else
> + success
> + fi
> + echo
> + done
> + fi
> + return $RETVAL
> +}
> +
> +stop()
> +{
> + echo -e $"Shutting down MPSS Stack: "
> +
> + # Bail out if module is unloaded
> + if [ ! -d "$sysfs" ]; then
> + echo -n $"Module unloaded "
> + killall -9 mpssd 2>/dev/null
> + success
> + echo
> + return 0
> + fi
> +
> + # Shut down the cards
> + for f in $sysfs/*
> + do
> + echo -e "Shutting down `basename $f` "
> + echo "shutdown" > $f/state 2>/dev/null
> + done
> +
> + # Wait for the cards to go offline
> + for f in $sysfs/*
> + do
> + while [ "`cat $f/state`" != "offline" ]
> + do
> + sleep 1
> + echo -e "Waiting for "`basename $f`" to go offline"
> + done
> + done
> +
> + # Display the status of the cards
> + for f in $sysfs/*
> + do
> + echo -e ""`basename $f`" state: "`cat $f/state`""
> + done
> +
> + sleep 5
> +
> + # Kill MPSSD now
> + echo -n $"Killing MPSSD"
> + killall -9 mpssd 2>/dev/null
> + RETVAL=$?
> + if [ $RETVAL -ne 0 ]; then
> + failure
> + else
> + success
> + fi
> + echo
> + return $RETVAL
> +}
> +
> +restart()
> +{
> + stop
> + sleep 5
> + start
> +}
> +
> +status()
> +{
> + if [ -d "$sysfs" ]; then
> + for f in $sysfs/*
> + do
> + echo -e ""`basename $f`" state: "`cat $f/state`""
> + done
> + fi
> +
> + if [ "`ps -e | awk '{print $4}' | grep mpssd | head -n 1`" = "mpssd" ]; then
> + echo "mpssd is running"
> + else
> + echo "mpssd is stopped"
> + fi
> + return 0
> +}
> +
> +unload()
> +{
> + if [ ! -d "$sysfs" ]; then
> + echo -n $"No MIC_HOST Module: "
> + killall -9 mpssd 2>/dev/null
> + success
> + echo
> + return
> + fi
> +
> + stop
> + RETVAL=$?
> +
> + sleep 5
> + echo -n $"Removing MIC_HOST Module: "
> +
> + if [ $RETVAL = 0 ]; then
> + sleep 1
> + modprobe -r mic_host
> + RETVAL=$?
> + fi
> +
> + if [ $RETVAL -ne 0 ]; then
> + failure
> + else
> + success
> + fi
> + echo
> + return $RETVAL
> +}
> +
> +case $1 in
> + start)
> + start
> + ;;
> + stop)
> + stop
> + ;;
> + restart)
> + restart
> + ;;
> + status)
> + status
> + ;;
> + unload)
> + unload
> + ;;
> + *)
> + echo $"Usage: $0 {start|stop|restart|status|unload}"
> + exit 2
> +esac
> +
> +exit $?
> diff --git a/Documentation/mic/mpssd/mpssd.c b/Documentation/mic/mpssd/mpssd.c
> new file mode 100644
> index 0000000..3bc34cb
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpssd.c
> @@ -0,0 +1,1689 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +
> +#define _GNU_SOURCE
> +
> +#include <stdlib.h>
> +#include <fcntl.h>
> +#include <getopt.h>
> +#include <assert.h>
> +#include <unistd.h>
> +#include <stdbool.h>
> +#include <signal.h>
> +#include <poll.h>
> +#include <features.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <sys/mman.h>
> +#include <sys/socket.h>
> +#include <linux/virtio_ring.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_console.h>
> +#include <linux/virtio_blk.h>
> +#include <linux/version.h>
> +#include "mpssd.h"
> +#include <linux/mic_ioctl.h>
> +#include <linux/mic_common.h>
> +
> +static void init_mic(struct mic_info *mic);
> +
> +static FILE *logfp;
> +static struct mic_info mic_list;
> +
> +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +
> +#define min_t(type, x, y) ({ \
> + type __min1 = (x); \
> + type __min2 = (y); \
> + __min1 < __min2 ? __min1 : __min2; })
> +
> +/* align addr on a size boundary - adjust address up/down if needed */
> +#define _ALIGN_UP(addr, size) (((addr)+((size)-1))&(~((size)-1)))
> +#define _ALIGN_DOWN(addr, size) ((addr)&(~((size)-1)))
> +
> +/* align addr on a size boundary - adjust address up if needed */
> +#define _ALIGN(addr, size) _ALIGN_UP(addr, size)
> +
> +/* to align the pointer to the (next) page boundary */
> +#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE)
> +
> +#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> +
> +/* Insert REP NOP (PAUSE) in busy-wait loops. */
> +static inline void cpu_relax(void)
> +{
> + asm volatile("rep; nop" : : : "memory");
> +}
> +
> +#define GSO_ENABLED 1
> +#define MAX_GSO_SIZE (64 * 1024)
> +#define ETH_H_LEN 14
> +#define MAX_NET_PKT_SIZE (_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64))
> +#define MIC_DEVICE_PAGE_END 0x1000
> +
> +#ifndef VIRTIO_NET_HDR_F_DATA_VALID
> +#define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */
> +#endif
> +
> +static struct {
> + struct mic_device_desc dd;
> + struct mic_vqconfig vqconfig[2];
> + __u32 host_features, guest_acknowledgements;
> + struct virtio_console_config cons_config;
> +} virtcons_dev_page = {
> + .dd = {
> + .type = VIRTIO_ID_CONSOLE,
> + .num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig),
> + .feature_len = sizeof(virtcons_dev_page.host_features),
> + .config_len = sizeof(virtcons_dev_page.cons_config),
> + },
> + .vqconfig[0] = {
> + .num = htole16(MIC_VRING_ENTRIES),
> + },
> + .vqconfig[1] = {
> + .num = htole16(MIC_VRING_ENTRIES),
> + },
> +};
> +
> +static struct {
> + struct mic_device_desc dd;
> + struct mic_vqconfig vqconfig[2];
> + __u32 host_features, guest_acknowledgements;
> + struct virtio_net_config net_config;
> +} virtnet_dev_page = {
> + .dd = {
> + .type = VIRTIO_ID_NET,
> + .num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig),
> + .feature_len = sizeof(virtnet_dev_page.host_features),
> + .config_len = sizeof(virtnet_dev_page.net_config),
> + },
> + .vqconfig[0] = {
> + .num = htole16(MIC_VRING_ENTRIES),
> + },
> + .vqconfig[1] = {
> + .num = htole16(MIC_VRING_ENTRIES),
> + },
> +#if GSO_ENABLED
> + .host_features = htole32(
> + 1 << VIRTIO_NET_F_CSUM |
> + 1 << VIRTIO_NET_F_GSO |
> + 1 << VIRTIO_NET_F_GUEST_TSO4 |
> + 1 << VIRTIO_NET_F_GUEST_TSO6 |
> + 1 << VIRTIO_NET_F_GUEST_ECN |
> + 1 << VIRTIO_NET_F_GUEST_UFO),
> +#else
> + .host_features = 0,
> +#endif
> +};
> +
> +static const char *mic_config_dir = "/etc/sysconfig/mic";
> +static const char *virtblk_backend = "VIRTBLK_BACKEND";
> +static struct {
> + struct mic_device_desc dd;
> + struct mic_vqconfig vqconfig[1];
> + __u32 host_features, guest_acknowledgements;
> + struct virtio_blk_config blk_config;
> +} virtblk_dev_page = {
> + .dd = {
> + .type = VIRTIO_ID_BLOCK,
> + .num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig),
> + .feature_len = sizeof(virtblk_dev_page.host_features),
> + .config_len = sizeof(virtblk_dev_page.blk_config),
> + },
> + .vqconfig[0] = {
> + .num = htole16(MIC_VRING_ENTRIES),
> + },
> + .host_features =
> + htole32(1<<VIRTIO_BLK_F_SEG_MAX),
> + .blk_config = {
> + .seg_max = htole32(MIC_VRING_ENTRIES - 2),
> + .capacity = htole64(0),
> + }
> +};
> +
> +static char *myname;
> +
> +static int
> +tap_configure(struct mic_info *mic, char *dev)
> +{
> + pid_t pid;
> + char *ifargv[7];
> + char ipaddr[IFNAMSIZ];
> + int ret = 0;
> +
> + pid = fork();
> + if (pid == 0) {
> + ifargv[0] = "ip";
> + ifargv[1] = "link";
> + ifargv[2] = "set";
> + ifargv[3] = dev;
> + ifargv[4] = "up";
> + ifargv[5] = NULL;
> + mpsslog("Configuring %s\n", dev);
> + ret = execvp("ip", ifargv);
> + if (ret < 0) {
> + mpsslog("%s execvp failed errno %s\n",
> + mic->name, strerror(errno));
> + return ret;
> + }
> + }
> + if (pid < 0) {
> + mpsslog("%s fork failed errno %s\n",
> + mic->name, strerror(errno));
> + return ret;
> + }
> +
> + ret = waitpid(pid, NULL, 0);
> + if (ret < 0) {
> + mpsslog("%s waitpid failed errno %s\n",
> + mic->name, strerror(errno));
> + return ret;
> + }
> +
> + snprintf(ipaddr, IFNAMSIZ, "172.31.%d.254/24", mic->id);
> +
> + pid = fork();
> + if (pid == 0) {
> + ifargv[0] = "ip";
> + ifargv[1] = "addr";
> + ifargv[2] = "add";
> + ifargv[3] = ipaddr;
> + ifargv[4] = "dev";
> + ifargv[5] = dev;
> + ifargv[6] = NULL;
> + mpsslog("Configuring %s ipaddr %s\n", dev, ipaddr);
> + ret = execvp("ip", ifargv);
> + if (ret < 0) {
> + mpsslog("%s execvp failed errno %s\n",
> + mic->name, strerror(errno));
> + return ret;
> + }
> + }
> + if (pid < 0) {
> + mpsslog("%s fork failed errno %s\n",
> + mic->name, strerror(errno));
> + return ret;
> + }
> +
> + ret = waitpid(pid, NULL, 0);
> + if (ret < 0) {
> + mpsslog("%s waitpid failed errno %s\n",
> + mic->name, strerror(errno));
> + return ret;
> + }
> + mpsslog("MIC name %s %s %d DONE!\n",
> + mic->name, __func__, __LINE__);
> + return 0;
> +}
> +
> +static int tun_alloc(struct mic_info *mic, char *dev)
> +{
> + struct ifreq ifr;
> + int fd, err;
> +#if GSO_ENABLED
> + unsigned offload;
> +#endif
> + fd = open("/dev/net/tun", O_RDWR);
> + if (fd < 0) {
> + mpsslog("Could not open /dev/net/tun %s\n", strerror(errno));
> + goto done;
> + }
> +
> + memset(&ifr, 0, sizeof(ifr));
> +
> + ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
> + if (*dev)
> + strncpy(ifr.ifr_name, dev, IFNAMSIZ);
> +
> + err = ioctl(fd, TUNSETIFF, (void *) &ifr);
> + if (err < 0) {
> + mpsslog("%s %s %d TUNSETIFF failed %s\n",
> + mic->name, __func__, __LINE__, strerror(errno));
> + close(fd);
> + return err;
> + }
> +#if GSO_ENABLED
> + offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> + TUN_F_TSO_ECN | TUN_F_UFO;
> +
> + err = ioctl(fd, TUNSETOFFLOAD, offload);
> + if (err < 0) {
> + mpsslog("%s %s %d TUNSETOFFLOAD failed %s\n",
> + mic->name, __func__, __LINE__, strerror(errno));
> + close(fd);
> + return err;
> + }
> +#endif
> + strcpy(dev, ifr.ifr_name);
> + mpsslog("Created TAP %s\n", dev);
> +done:
> + return fd;
> +}
> +
> +#define NET_FD_VIRTIO_NET 0
> +#define NET_FD_TUN 1
> +#define MAX_NET_FD 2
> +
> +static void * *
> +get_dp(struct mic_info *mic, int type)
> +{
> + switch (type) {
> + case VIRTIO_ID_CONSOLE:
> + return &mic->mic_console.console_dp;
> + case VIRTIO_ID_NET:
> + return &mic->mic_net.net_dp;
> + case VIRTIO_ID_BLOCK:
> + return &mic->mic_virtblk.block_dp;
> + }
> + mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> + assert(0);
> + return NULL;
> +}
> +
> +static struct mic_device_desc *get_device_desc(struct mic_info *mic, int type)
> +{
> + struct mic_device_desc *d;
> + int i;
> + void *dp = *get_dp(mic, type);
> +
> + for (i = mic_aligned_size(struct mic_bootparam); i < PAGE_SIZE;
> + i += mic_total_desc_size(d)) {
> + d = dp + i;
> +
> + /* End of list */
> + if (d->type == 0)
> + break;
> +
> + if (d->type == -1)
> + continue;
> +
> + mpsslog("%s %s d-> type %d d %p\n",
> + mic->name, __func__, d->type, d);
> +
> + if (d->type == (__u8)type)
> + return d;
> + }
> + mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> + assert(0);
> + return NULL;
> +}
> +
> +/* See comments in vhost.c for explanation of next_desc() */
> +static unsigned next_desc(struct vring_desc *desc)
> +{
> + unsigned int next;
> +
> + if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT))
> + return -1U;
> + next = le16toh(desc->next);
> + return next;
> +}
> +
> +/* Sum up all the IOVEC length */
> +static ssize_t
> +sum_iovec_len(struct mic_copy_desc *copy)
> +{
> + ssize_t sum = 0;
> + int i;
> +
> + for (i = 0; i < copy->iovcnt; i++)
> + sum += copy->iov[i].iov_len;
> + return sum;
> +}
> +
> +static inline void verify_out_len(struct mic_info *mic,
> + struct mic_copy_desc *copy)
> +{
> + if (copy->out_len != sum_iovec_len(copy)) {
> + mpsslog("%s %s %d BUG copy->out_len 0x%x len 0x%x\n",
> + mic->name, __func__, __LINE__,
> + copy->out_len, sum_iovec_len(copy));
> + assert(copy->out_len == sum_iovec_len(copy));
> + }
> +}
> +
> +/* Display an iovec */
> +static void
> +disp_iovec(struct mic_info *mic, struct mic_copy_desc *copy,
> + const char *s, int line)
> +{
> + int i;
> +
> + for (i = 0; i < copy->iovcnt; i++)
> + mpsslog("%s %s %d copy->iov[%d] addr %p len 0x%lx\n",
> + mic->name, s, line, i,
> + copy->iov[i].iov_base, copy->iov[i].iov_len);
> +}
> +
> +static inline __u16 read_avail_idx(struct mic_vring *vr)
> +{
> + return ACCESS_ONCE(vr->info->avail_idx);
> +}
> +
> +static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr,
> + struct mic_copy_desc *copy, ssize_t len)
> +{
> + copy->vr_idx = tx ? 0 : 1;
> + copy->update_used = true;
> + if (type == VIRTIO_ID_NET)
> + copy->iov[1].iov_len = len - sizeof(struct virtio_net_hdr);
> + else
> + copy->iov[0].iov_len = len;
> +}
> +
> +/* Central API which triggers the copies */
> +static int
> +mic_virtio_copy(struct mic_info *mic, int fd,
> + struct mic_vring *vr, struct mic_copy_desc *copy)
> +{
> + int ret;
> +
> + ret = ioctl(fd, MIC_VIRTIO_COPY_DESC, copy);
> + if (ret) {
> + mpsslog("%s %s %d errno %s ret %d\n",
> + mic->name, __func__, __LINE__,
> + strerror(errno), ret);
> + }
> + return ret;
> +}
> +
> +/*
> + * This initialization routine requires at least one
> + * vring i.e. vr0. vr1 is optional.
> + */
> +static void *
> +init_vr(struct mic_info *mic, int fd, int type,
> + struct mic_vring *vr0, struct mic_vring *vr1, int num_vq)
> +{
> + int vr_size;
> + char *va;
> +
> + vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> + MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> + va = mmap(NULL, MIC_DEVICE_PAGE_END + vr_size * num_vq,
> + PROT_READ, MAP_SHARED, fd, 0);
> + if (MAP_FAILED == va) {
> + mpsslog("%s %s %d mmap failed errno %s\n",
> + mic->name, __func__, __LINE__,
> + strerror(errno));
> + goto done;
> + }
> + *get_dp(mic, type) = (void *)va;
> + vr0->va = (struct mic_vring *)&va[MIC_DEVICE_PAGE_END];
> + vr0->info = vr0->va +
> + vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN);
> + vring_init(&vr0->vr,
> + MIC_VRING_ENTRIES, vr0->va, MIC_VIRTIO_RING_ALIGN);
> + mpsslog("%s %s vr0 %p vr0->info %p vr_size 0x%x vring 0x%x ",
> + __func__, mic->name, vr0->va, vr0->info, vr_size,
> + vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> + mpsslog("magic 0x%x expected 0x%x\n",
> + vr0->info->magic, MIC_MAGIC + type + 0);
> + assert(vr0->info->magic == MIC_MAGIC + type + 0);
> + if (vr1) {
> + vr1->va = (struct mic_vring *)
> + &va[MIC_DEVICE_PAGE_END + vr_size];
> + vr1->info = vr1->va + vring_size(MIC_VRING_ENTRIES,
> + MIC_VIRTIO_RING_ALIGN);
> + vring_init(&vr1->vr,
> + MIC_VRING_ENTRIES, vr1->va, MIC_VIRTIO_RING_ALIGN);
> + mpsslog("%s %s vr1 %p vr1->info %p vr_size 0x%x vring 0x%x ",
> + __func__, mic->name, vr1->va, vr1->info, vr_size,
> + vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> + mpsslog("magic 0x%x expected 0x%x\n",
> + vr1->info->magic, MIC_MAGIC + type + 1);
> + assert(vr1->info->magic == MIC_MAGIC + type + 1);
> + }
> +done:
> + return va;
> +}
> +
> +static void
> +uninit_vr(struct mic_info *mic, int num_vq)
> +{
> + int vr_size, ret;
> +
> + vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> + MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> + ret = munmap(mic->mic_virtblk.block_dp,
> + MIC_DEVICE_PAGE_END + vr_size * num_vq);
> + if (ret < 0)
> + mpsslog("%s munmap errno %d\n", mic->name, errno);
> +}
> +
> +static void
> +wait_for_card_driver(struct mic_info *mic, int fd, int type)
> +{
> + struct pollfd pollfd;
> + int err;
> + struct mic_device_desc *desc = get_device_desc(mic, type);
> +
> + pollfd.fd = fd;
> + mpsslog("%s %s Waiting .... desc-> type %d status 0x%x\n",
> + mic->name, __func__, type, desc->status);
> + while (1) {
> + pollfd.events = POLLIN;
> + pollfd.revents = 0;
> + err = poll(&pollfd, 1, -1);
> + if (err < 0) {
> + mpsslog("%s %s poll failed %s\n",
> + mic->name, __func__, strerror(errno));
> + continue;
> + }
> +
> + if (pollfd.revents) {
> + mpsslog("%s %s Waiting... desc-> type %d status 0x%x\n",
> + mic->name, __func__, type, desc->status);
> + if (desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> + mpsslog("%s %s poll.revents %d\n",
> + mic->name, __func__, pollfd.revents);
> + mpsslog("%s %s desc-> type %d status 0x%x\n",
> + mic->name, __func__, type,
> + desc->status);
> + break;
> + }
> + }
> + }
> +}
> +
> +/* Spin till we have some descriptors */
> +static void
> +wait_for_descriptors(struct mic_info *mic, struct mic_vring *vr)
> +{
> + __u16 avail_idx = read_avail_idx(vr);
> +
> + while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) {
> +#ifdef DEBUG
> + mpsslog("%s %s waiting for desc avail %d info_avail %d\n",
> + mic->name, __func__,
> + le16toh(vr->vr.avail->idx), vr->info->avail_idx);
> +#endif
> + cpu_relax();
> + }
> +}
> +
> +static void *
> +virtio_net(void *arg)
> +{
> + static __u8 vnet_hdr[2][sizeof(struct virtio_net_hdr)];
> + static __u8 vnet_buf[2][MAX_NET_PKT_SIZE] __aligned(64);
> + struct iovec vnet_iov[2][2] = {
> + { { .iov_base = vnet_hdr[0], .iov_len = sizeof(vnet_hdr[0]) },
> + { .iov_base = vnet_buf[0], .iov_len = sizeof(vnet_buf[0]) } },
> + { { .iov_base = vnet_hdr[1], .iov_len = sizeof(vnet_hdr[1]) },
> + { .iov_base = vnet_buf[1], .iov_len = sizeof(vnet_buf[1]) } },
> + };
> + struct iovec *iov0 = vnet_iov[0], *iov1 = vnet_iov[1];
> + struct mic_info *mic = (struct mic_info *)arg;
> + char if_name[IFNAMSIZ];
> + struct pollfd net_poll[MAX_NET_FD];
> + struct mic_vring tx_vr, rx_vr;
> + struct mic_copy_desc copy;
> + struct mic_device_desc *desc;
> + int err;
> +
> + snprintf(if_name, IFNAMSIZ, "mic%d", mic->id);
> + mic->mic_net.tap_fd = tun_alloc(mic, if_name);
> + if (mic->mic_net.tap_fd < 0)
> + goto done;
> +
> + if (tap_configure(mic, if_name))
> + goto done;
> + mpsslog("MIC name %s id %d\n", mic->name, mic->id);
> +
> + net_poll[NET_FD_VIRTIO_NET].fd = mic->mic_net.virtio_net_fd;
> + net_poll[NET_FD_VIRTIO_NET].events = POLLIN;
> + net_poll[NET_FD_TUN].fd = mic->mic_net.tap_fd;
> + net_poll[NET_FD_TUN].events = POLLIN;
> +
> + if (MAP_FAILED == init_vr(mic, mic->mic_net.virtio_net_fd,
> + VIRTIO_ID_NET, &tx_vr, &rx_vr,
> + virtnet_dev_page.dd.num_vq)) {
> + mpsslog("%s init_vr failed %s\n",
> + mic->name, strerror(errno));
> + goto done;
> + }
> +
> + copy.iovcnt = 2;
> + desc = get_device_desc(mic, VIRTIO_ID_NET);
> +
> + while (1) {
> + ssize_t len;
> +
> + net_poll[NET_FD_VIRTIO_NET].revents = 0;
> + net_poll[NET_FD_TUN].revents = 0;
> +
> + /* Start polling for data from tap and virtio net */
> + err = poll(net_poll, 2, -1);
> + if (err < 0) {
> + mpsslog("%s poll failed %s\n",
> + __func__, strerror(errno));
> + continue;
> + }
> + if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> + wait_for_card_driver(mic, mic->mic_net.virtio_net_fd,
> + VIRTIO_ID_NET);
> + /*
> + * Check if there is data to be read from TUN and write to
> + * virtio net fd if there is.
> + */
> + if (net_poll[NET_FD_TUN].revents & POLLIN) {
> + copy.iov = iov0;
> + len = readv(net_poll[NET_FD_TUN].fd,
> + copy.iov, copy.iovcnt);
> + if (len > 0) {
> + struct virtio_net_hdr *hdr
> + = (struct virtio_net_hdr *) vnet_hdr[0];
> +
> + /* Disable checksums on the card since we are on
> + a reliable PCIe link */
> + hdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
> +#ifdef DEBUG
> + mpsslog("%s %s %d hdr->flags 0x%x ", mic->name,
> + __func__, __LINE__, hdr->flags);
> + mpsslog("copy.out_len %d hdr->gso_type 0x%x\n",
> + copy.out_len, hdr->gso_type);
> +#endif
> +#ifdef DEBUG
> + disp_iovec(mic, copy, __func__, __LINE__);
> + mpsslog("%s %s %d read from tap 0x%lx\n",
> + mic->name, __func__, __LINE__,
> + len);
> +#endif
> + wait_for_descriptors(mic, &tx_vr);
> + txrx_prepare(VIRTIO_ID_NET, 1, &tx_vr, &copy,
> + len);
> +
> + err = mic_virtio_copy(mic,
> + mic->mic_net.virtio_net_fd, &tx_vr,
> + &copy);
> + if (err < 0) {
> + mpsslog("%s %s %d mic_virtio_copy %s\n",
> + mic->name, __func__, __LINE__,
> + strerror(errno));
> + }
> + if (!err)
> + verify_out_len(mic, &copy);
> +#ifdef DEBUG
> + disp_iovec(mic, copy, __func__, __LINE__);
> + mpsslog("%s %s %d wrote to net 0x%lx\n",
> + mic->name, __func__, __LINE__,
> + sum_iovec_len(&copy));
> +#endif
> + /* Reinitialize IOV for next run */
> + iov0[1].iov_len = MAX_NET_PKT_SIZE;
> + } else if (len < 0) {
> + disp_iovec(mic, &copy, __func__, __LINE__);
> + mpsslog("%s %s %d read failed %s ", mic->name,
> + __func__, __LINE__, strerror(errno));
> + mpsslog("cnt %d sum %d\n",
> + copy.iovcnt, sum_iovec_len(&copy));
> + }
> + }
> +
> + /*
> + * Check if there is data to be read from virtio net and
> + * write to TUN if there is.
> + */
> + if (net_poll[NET_FD_VIRTIO_NET].revents & POLLIN) {
> + while (rx_vr.info->avail_idx !=
> + le16toh(rx_vr.vr.avail->idx)) {
> + copy.iov = iov1;
> + txrx_prepare(VIRTIO_ID_NET, 0, &rx_vr, &copy,
> + MAX_NET_PKT_SIZE
> + + sizeof(struct virtio_net_hdr));
> +
> + err = mic_virtio_copy(mic,
> + mic->mic_net.virtio_net_fd, &rx_vr,
> + &copy);
> + if (!err) {
> +#ifdef DEBUG
> + struct virtio_net_hdr *hdr
> + = (struct virtio_net_hdr *)
> + vnet_hdr[1];
> +
> + mpsslog("%s %s %d hdr->flags 0x%x, ",
> + mic->name, __func__, __LINE__,
> + hdr->flags);
> + mpsslog("out_len %d gso_type 0x%x\n",
> + copy.out_len,
> + hdr->gso_type);
> +#endif
> + /* Set the correct output iov_len */
> + iov1[1].iov_len = copy.out_len -
> + sizeof(struct virtio_net_hdr);
> + verify_out_len(mic, &copy);
> +#ifdef DEBUG
> + disp_iovec(mic, copy, __func__,
> + __LINE__);
> + mpsslog("%s %s %d ",
> + mic->name, __func__, __LINE__);
> + mpsslog("read from net 0x%lx\n",
> + sum_iovec_len(copy));
> +#endif
> + len = writev(net_poll[NET_FD_TUN].fd,
> + copy.iov, copy.iovcnt);
> + if (len != sum_iovec_len(&copy)) {
> + mpsslog("Tun write failed %s ",
> + strerror(errno));
> + mpsslog("len 0x%x ", len);
> + mpsslog("read_len 0x%x\n",
> + sum_iovec_len(&copy));
> + } else {
> +#ifdef DEBUG
> + disp_iovec(mic, &copy, __func__,
> + __LINE__);
> + mpsslog("%s %s %d ",
> + mic->name, __func__,
> + __LINE__);
> + mpsslog("wrote to tap 0x%lx\n",
> + len);
> +#endif
> + }
> + } else {
> + mpsslog("%s %s %d mic_virtio_copy %s\n",
> + mic->name, __func__, __LINE__,
> + strerror(errno));
> + break;
> + }
> + }
> + }
> + if (net_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> + mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> + sleep(1);
> + }
> + }
> +done:
> + pthread_exit(NULL);
> +}
> +
> +/* virtio_console */
> +#define VIRTIO_CONSOLE_FD 0
> +#define MONITOR_FD (VIRTIO_CONSOLE_FD + 1)
> +#define MAX_CONSOLE_FD (MONITOR_FD + 1) /* must be the last one + 1 */
> +#define MAX_BUFFER_SIZE PAGE_SIZE
> +
> +static void *
> +virtio_console(void *arg)
> +{
> + static __u8 vcons_buf[2][PAGE_SIZE];
> + struct iovec vcons_iov[2] = {
> + { .iov_base = vcons_buf[0], .iov_len = sizeof(vcons_buf[0]) },
> + { .iov_base = vcons_buf[1], .iov_len = sizeof(vcons_buf[1]) },
> + };
> + struct iovec *iov0 = &vcons_iov[0], *iov1 = &vcons_iov[1];
> + struct mic_info *mic = (struct mic_info *)arg;
> + int err;
> + struct pollfd console_poll[MAX_CONSOLE_FD];
> + int pty_fd;
> + char *pts_name;
> + ssize_t len;
> + struct mic_vring tx_vr, rx_vr;
> + struct mic_copy_desc copy;
> + struct mic_device_desc *desc;
> +
> + pty_fd = posix_openpt(O_RDWR);
> + if (pty_fd < 0) {
> + mpsslog("can't open a pseudoterminal master device: %s\n",
> + strerror(errno));
> + goto _return;
> + }
> + pts_name = ptsname(pty_fd);
> + if (pts_name == NULL) {
> + mpsslog("can't get pts name\n");
> + goto _close_pty;
> + }
> + printf("%s console message goes to %s\n", mic->name, pts_name);
> + mpsslog("%s console message goes to %s\n", mic->name, pts_name);
> + err = grantpt(pty_fd);
> + if (err < 0) {
> + mpsslog("can't grant access: %s %s\n",
> + pts_name, strerror(errno));
> + goto _close_pty;
> + }
> + err = unlockpt(pty_fd);
> + if (err < 0) {
> + mpsslog("can't unlock a pseudoterminal: %s %s\n",
> + pts_name, strerror(errno));
> + goto _close_pty;
> + }
> + console_poll[MONITOR_FD].fd = pty_fd;
> + console_poll[MONITOR_FD].events = POLLIN;
> +
> + console_poll[VIRTIO_CONSOLE_FD].fd = mic->mic_console.virtio_console_fd;
> + console_poll[VIRTIO_CONSOLE_FD].events = POLLIN;
> +
> + if (MAP_FAILED == init_vr(mic, mic->mic_console.virtio_console_fd,
> + VIRTIO_ID_CONSOLE, &tx_vr, &rx_vr,
> + virtcons_dev_page.dd.num_vq)) {
> + mpsslog("%s init_vr failed %s\n",
> + mic->name, strerror(errno));
> + goto _close_pty;
> + }
> +
> + copy.iovcnt = 1;
> + desc = get_device_desc(mic, VIRTIO_ID_CONSOLE);
> +
> + for (;;) {
> + console_poll[MONITOR_FD].revents = 0;
> + console_poll[VIRTIO_CONSOLE_FD].revents = 0;
> + err = poll(console_poll, MAX_CONSOLE_FD, -1);
> + if (err < 0) {
> + mpsslog("%s %d: poll failed: %s\n", __func__, __LINE__,
> + strerror(errno));
> + continue;
> + }
> + if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> + wait_for_card_driver(mic,
> + mic->mic_console.virtio_console_fd,
> + VIRTIO_ID_CONSOLE);
> +
> + if (console_poll[MONITOR_FD].revents & POLLIN) {
> + copy.iov = iov0;
> + len = readv(pty_fd, copy.iov, copy.iovcnt);
> + if (len > 0) {
> +#ifdef DEBUG
> + disp_iovec(mic, copy, __func__, __LINE__);
> + mpsslog("%s %s %d read from tap 0x%lx\n",
> + mic->name, __func__, __LINE__,
> + len);
> +#endif
> + wait_for_descriptors(mic, &tx_vr);
> + txrx_prepare(VIRTIO_ID_CONSOLE, 1, &tx_vr,
> + &copy, len);
> +
> + err = mic_virtio_copy(mic,
> + mic->mic_console.virtio_console_fd,
> + &tx_vr, &copy);
> + if (err < 0) {
> + mpsslog("%s %s %d mic_virtio_copy %s\n",
> + mic->name, __func__, __LINE__,
> + strerror(errno));
> + }
> + if (!err)
> + verify_out_len(mic, &copy);
> +#ifdef DEBUG
> + disp_iovec(mic, copy, __func__, __LINE__);
> + mpsslog("%s %s %d wrote to net 0x%lx\n",
> + mic->name, __func__, __LINE__,
> + sum_iovec_len(copy));
> +#endif
> + /* Reinitialize IOV for next run */
> + iov0->iov_len = PAGE_SIZE;
> + } else if (len < 0) {
> + disp_iovec(mic, &copy, __func__, __LINE__);
> + mpsslog("%s %s %d read failed %s ",
> + mic->name, __func__, __LINE__,
> + strerror(errno));
> + mpsslog("cnt %d sum %d\n",
> + copy.iovcnt, sum_iovec_len(&copy));
> + }
> + }
> +
> + if (console_poll[VIRTIO_CONSOLE_FD].revents & POLLIN) {
> + while (rx_vr.info->avail_idx !=
> + le16toh(rx_vr.vr.avail->idx)) {
> + copy.iov = iov1;
> + txrx_prepare(VIRTIO_ID_CONSOLE, 0, &rx_vr,
> + &copy, PAGE_SIZE);
> +
> + err = mic_virtio_copy(mic,
> + mic->mic_console.virtio_console_fd,
> + &rx_vr, &copy);
> + if (!err) {
> + /* Set the correct output iov_len */
> + iov1->iov_len = copy.out_len;
> + verify_out_len(mic, &copy);
> +#ifdef DEBUG
> + disp_iovec(mic, copy, __func__,
> + __LINE__);
> + mpsslog("%s %s %d ",
> + mic->name, __func__, __LINE__);
> + mpsslog("read from net 0x%lx\n",
> + sum_iovec_len(copy));
> +#endif
> + len = writev(pty_fd,
> + copy.iov, copy.iovcnt);
> + if (len != sum_iovec_len(&copy)) {
> + mpsslog("Tun write failed %s ",
> + strerror(errno));
> + mpsslog("len 0x%x ", len);
> + mpsslog("read_len 0x%x\n",
> + sum_iovec_len(&copy));
> + } else {
> +#ifdef DEBUG
> + disp_iovec(mic, copy, __func__,
> + __LINE__);
> + mpsslog("%s %s %d ",
> + mic->name, __func__,
> + __LINE__);
> + mpsslog("wrote to tap 0x%lx\n",
> + len);
> +#endif
> + }
> + } else {
> + mpsslog("%s %s %d mic_virtio_copy %s\n",
> + mic->name, __func__, __LINE__,
> + strerror(errno));
> + break;
> + }
> + }
> + }
> + if (console_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> + mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> + sleep(1);
> + }
> + }
> +_close_pty:
> + close(pty_fd);
> +_return:
> + pthread_exit(NULL);
> +}
> +
> +static void
> +add_virtio_device(struct mic_info *mic, struct mic_device_desc *dd)
> +{
> + char path[PATH_MAX];
> + int fd, err;
> +
> + snprintf(path, PATH_MAX, "/dev/mic%d", mic->id);
> + fd = open(path, O_RDWR);
> + if (fd < 0) {
> + mpsslog("Could not open %s %s\n", path, strerror(errno));
> + return;
> + }
> +
> + err = ioctl(fd, MIC_VIRTIO_ADD_DEVICE, dd);
> + if (err < 0) {
> + mpsslog("Could not add %d %s\n", dd->type, strerror(errno));
> + close(fd);
> + return;
> + }
> + switch (dd->type) {
> + case VIRTIO_ID_NET:
> + mic->mic_net.virtio_net_fd = fd;
> + mpsslog("Added VIRTIO_ID_NET for %s\n", mic->name);
> + break;
> + case VIRTIO_ID_CONSOLE:
> + mic->mic_console.virtio_console_fd = fd;
> + mpsslog("Added VIRTIO_ID_CONSOLE for %s\n", mic->name);
> + break;
> + case VIRTIO_ID_BLOCK:
> + mic->mic_virtblk.virtio_block_fd = fd;
> + mpsslog("Added VIRTIO_ID_BLOCK for %s\n", mic->name);
> + break;
> + }
> +}
> +
> +static bool
> +set_backend_file(struct mic_info *mic)
> +{
> + FILE *config;
> + char buff[PATH_MAX], *line, *evv, *p;
> +
> + snprintf(buff, PATH_MAX, "%s/mpssd%03d.conf", mic_config_dir, mic->id);
> + config = fopen(buff, "r");
> + if (config == NULL)
> + return false;
> + do { /* look for "virtblk_backend=XXXX" */
> + line = fgets(buff, PATH_MAX, config);
> + if (line == NULL)
> + break;
> + if (*line == '#')
> + continue;
> + p = strchr(line, '\n');
> + if (p)
> + *p = '\0';
> + } while (strncmp(line, virtblk_backend, strlen(virtblk_backend)) != 0);
> + fclose(config);
> + if (line == NULL)
> + return false;
> + evv = strchr(line, '=');
> + if (evv == NULL)
> + return false;
> + mic->mic_virtblk.backend_file = malloc(strlen(evv));
> + if (mic->mic_virtblk.backend_file == NULL) {
> + mpsslog("can't allocate memory\n", mic->name, mic->id);
> + return false;
> + }
> + strcpy(mic->mic_virtblk.backend_file, evv + 1);
> + return true;
> +}
> +
> +#define SECTOR_SIZE 512
> +static bool
> +set_backend_size(struct mic_info *mic)
> +{
> + mic->mic_virtblk.backend_size = lseek(mic->mic_virtblk.backend, 0,
> + SEEK_END);
> + if (mic->mic_virtblk.backend_size < 0) {
> + mpsslog("%s: can't seek: %s\n",
> + mic->name, mic->mic_virtblk.backend_file);
> + return false;
> + }
> + virtblk_dev_page.blk_config.capacity =
> + mic->mic_virtblk.backend_size / SECTOR_SIZE;
> + if ((mic->mic_virtblk.backend_size % SECTOR_SIZE) != 0)
> + virtblk_dev_page.blk_config.capacity++;
> +
> + virtblk_dev_page.blk_config.capacity =
> + htole64(virtblk_dev_page.blk_config.capacity);
> +
> + return true;
> +}
> +
> +static bool
> +open_backend(struct mic_info *mic)
> +{
> + if (!set_backend_file(mic))
> + goto _error_exit;
> + mic->mic_virtblk.backend = open(mic->mic_virtblk.backend_file, O_RDWR);
> + if (mic->mic_virtblk.backend < 0) {
> + mpsslog("%s: can't open: %s\n", mic->name,
> + mic->mic_virtblk.backend_file);
> + goto _error_free;
> + }
> + if (!set_backend_size(mic))
> + goto _error_close;
> + mic->mic_virtblk.backend_addr = mmap(NULL,
> + mic->mic_virtblk.backend_size,
> + PROT_READ|PROT_WRITE, MAP_SHARED,
> + mic->mic_virtblk.backend, 0L);
> + if (mic->mic_virtblk.backend_addr == MAP_FAILED) {
> + mpsslog("%s: can't map: %s %s\n",
> + mic->name, mic->mic_virtblk.backend_file,
> + strerror(errno));
> + goto _error_close;
> + }
> + return true;
> +
> + _error_close:
> + close(mic->mic_virtblk.backend);
> + _error_free:
> + free(mic->mic_virtblk.backend_file);
> + _error_exit:
> + return false;
> +}
> +
> +static void
> +close_backend(struct mic_info *mic)
> +{
> + munmap(mic->mic_virtblk.backend_addr, mic->mic_virtblk.backend_size);
> + close(mic->mic_virtblk.backend);
> + free(mic->mic_virtblk.backend_file);
> +}
> +
> +static bool
> +start_virtblk(struct mic_info *mic, struct mic_vring *vring)
> +{
> + if (((__u64)&virtblk_dev_page.blk_config % 8) != 0) {
> + mpsslog("%s: blk_config is not 8 byte aligned.\n",
> + mic->name);
> + return false;
> + }
> + add_virtio_device(mic, &virtblk_dev_page.dd);
> + if (MAP_FAILED == init_vr(mic, mic->mic_virtblk.virtio_block_fd,
> + VIRTIO_ID_BLOCK, vring, NULL, virtblk_dev_page.dd.num_vq)) {
> + mpsslog("%s init_vr failed %s\n",
> + mic->name, strerror(errno));
> + return false;
> + }
> + return true;
> +}
> +
> +static void
> +stop_virtblk(struct mic_info *mic)
> +{
> + uninit_vr(mic, virtblk_dev_page.dd.num_vq);
> + close(mic->mic_virtblk.virtio_block_fd);
> +}
> +
> +static __u8
> +header_error_check(struct vring_desc *desc)
> +{
> + if (le32toh(desc->len) != sizeof(struct virtio_blk_outhdr)) {
> + mpsslog("%s() %d: length is not sizeof(virtio_blk_outhd)\n",
> + __func__, __LINE__);
> + return -EIO;
> + }
> + if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) {
> + mpsslog("%s() %d: alone\n",
> + __func__, __LINE__);
> + return -EIO;
> + }
> + if (le16toh(desc->flags) & VRING_DESC_F_WRITE) {
> + mpsslog("%s() %d: not read\n",
> + __func__, __LINE__);
> + return -EIO;
> + }
> + return 0;
> +}
> +
> +static int
> +read_header(int fd, struct virtio_blk_outhdr *hdr, __u32 desc_idx)
> +{
> + struct iovec iovec;
> + struct mic_copy_desc copy;
> +
> + iovec.iov_len = sizeof(*hdr);
> + iovec.iov_base = hdr;
> + copy.iov = &iovec;
> + copy.iovcnt = 1;
> + copy.vr_idx = 0; /* only one vring on virtio_block */
> + copy.update_used = false; /* do not update used index */
> + return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static int
> +transfer_blocks(int fd, struct iovec *iovec, __u32 iovcnt)
> +{
> + struct mic_copy_desc copy;
> +
> + copy.iov = iovec;
> + copy.iovcnt = iovcnt;
> + copy.vr_idx = 0; /* only one vring on virtio_block */
> + copy.update_used = false; /* do not update used index */
> + return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static __u8
> +status_error_check(struct vring_desc *desc)
> +{
> + if (le32toh(desc->len) != sizeof(__u8)) {
> + mpsslog("%s() %d: length is not sizeof(status)\n",
> + __func__, __LINE__);
> + return -EIO;
> + }
> + return 0;
> +}
> +
> +static int
> +write_status(int fd, __u8 *status)
> +{
> + struct iovec iovec;
> + struct mic_copy_desc copy;
> +
> + iovec.iov_base = status;
> + iovec.iov_len = sizeof(*status);
> + copy.iov = &iovec;
> + copy.iovcnt = 1;
> + copy.vr_idx = 0; /* only one vring on virtio_block */
> + copy.update_used = true; /* Update used index */
> + return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static void *
> +virtio_block(void *arg)
> +{
> + struct mic_info *mic = (struct mic_info *) arg;
> + int ret;
> + struct pollfd block_poll;
> + struct mic_vring vring;
> + __u16 avail_idx;
> + __u32 desc_idx;
> + struct vring_desc *desc;
> + struct iovec *iovec, *piov;
> + __u8 status;
> + __u32 buffer_desc_idx;
> + struct virtio_blk_outhdr hdr;
> + void *fos;
> +
> + for (;;) { /* forever */
> + if (!open_backend(mic)) { /* No virtblk */
> + for (mic->mic_virtblk.signaled = 0;
> + !mic->mic_virtblk.signaled;)
> + sleep(1);
> + continue;
> + }
> +
> + /* backend file is specified. */
> + if (!start_virtblk(mic, &vring))
> + goto _close_backend;
> + iovec = malloc(sizeof(*iovec) *
> + le32toh(virtblk_dev_page.blk_config.seg_max));
> + if (!iovec) {
> + mpsslog("%s: can't alloc iovec: %s\n",
> + mic->name, strerror(ENOMEM));
> + goto _stop_virtblk;
> + }
> +
> + block_poll.fd = mic->mic_virtblk.virtio_block_fd;
> + block_poll.events = POLLIN;
> + for (mic->mic_virtblk.signaled = 0;
> + !mic->mic_virtblk.signaled;) {
> + block_poll.revents = 0;
> + /* timeout in 1 sec to see signaled */
> + ret = poll(&block_poll, 1, 1000);
> + if (ret < 0) {
> + mpsslog("%s %d: poll failed: %s\n",
> + __func__, __LINE__,
> + strerror(errno));
> + continue;
> + }
> +
> + if (!(block_poll.revents & POLLIN)) {
> +#ifdef DEBUG
> + mpsslog("%s %d: block_poll.revents=0x%x\n",
> + __func__, __LINE__, block_poll.revents);
> + sleep(1);
> +#endif
> + continue;
> + }
> +
> + /* POLLIN */
> + while (vring.info->avail_idx !=
> + le16toh(vring.vr.avail->idx)) {
> + /* read header element */
> + avail_idx =
> + vring.info->avail_idx &
> + (vring.vr.num - 1);
> + desc_idx = le16toh(
> + vring.vr.avail->ring[avail_idx]);
> + desc = &vring.vr.desc[desc_idx];
> +#ifdef DEBUG
> + mpsslog("%s() %d: avail_idx=%d ",
> + __func__, __LINE__,
> + vring.info->avail_idx);
> + mpsslog("vring.vr.num=%d desc=%p\n",
> + vring.vr.num, desc);
> +#endif
> + status = header_error_check(desc);
> + ret = read_header(
> + mic->mic_virtblk.virtio_block_fd,
> + &hdr, desc_idx);
> + if (ret < 0) {
> + mpsslog("%s() %d %s: ret=%d %s\n",
> + __func__, __LINE__,
> + mic->name, ret,
> + strerror(errno));
> + break;
> + }
> + /* buffer element */
> + piov = iovec;
> + status = 0;
> + fos = mic->mic_virtblk.backend_addr +
> + (hdr.sector * SECTOR_SIZE);
> + buffer_desc_idx = desc_idx =
> + next_desc(desc);
> + for (desc = &vring.vr.desc[buffer_desc_idx];
> + desc->flags & VRING_DESC_F_NEXT;
> + desc_idx = next_desc(desc),
> + desc = &vring.vr.desc[desc_idx]) {
> + piov->iov_len = desc->len;
> + piov->iov_base = fos;
> + piov++;
> + fos += desc->len;
> + }
> + /* Returning NULLs for VIRTIO_BLK_T_GET_ID. */
> + if (hdr.type & ~(VIRTIO_BLK_T_OUT |
> + VIRTIO_BLK_T_GET_ID)) {
> + /*
> + VIRTIO_BLK_T_IN - does not do
> + anything. Probably for documenting.
> + VIRTIO_BLK_T_SCSI_CMD - for
> + virtio_scsi.
> + VIRTIO_BLK_T_FLUSH - turned off in
> + config space.
> + VIRTIO_BLK_T_BARRIER - defined but not
> + used in anywhere.
> + */
> + mpsslog("%s() %d: type %x ",
> + __func__, __LINE__,
> + hdr.type);
> + mpsslog("is not supported\n");
> + status = -ENOTSUP;
> +
> + } else {
> + ret = transfer_blocks(
> + mic->mic_virtblk.virtio_block_fd,
> + iovec,
> + piov - iovec);
> + if (ret < 0 &&
> + status != 0)
> + status = ret;
> + }
> + /* write status and update used pointer */
> + if (status != 0)
> + status = status_error_check(desc);
> + ret = write_status(
> + mic->mic_virtblk.virtio_block_fd,
> + &status);
> +#ifdef DEBUG
> + mpsslog("%s() %d: write status=%d on desc=%p\n",
> + __func__, __LINE__,
> + status, desc);
> +#endif
> + }
> + }
> + free(iovec);
> +_stop_virtblk:
> + stop_virtblk(mic);
> +_close_backend:
> + close_backend(mic);
> + } /* forever */
> +
> + pthread_exit(NULL);
> +}
> +
> +static void
> +reset(struct mic_info *mic)
> +{
> +#define RESET_TIMEOUT 120
> + int i = RESET_TIMEOUT;
> + setsysfs(mic->name, "state", "reset");
> + while (i) {
> + char *state;
> + state = readsysfs(mic->name, "state");
> + if (!state)
> + goto retry;
> + mpsslog("%s: %s %d state %s\n",
> + mic->name, __func__, __LINE__, state);
> + if ((!strcmp(state, "offline"))) {
> + free(state);
> + break;
> + }
> + free(state);
> +retry:
> + sleep(1);
> + i--;
> + }
> +}
> +
> +static int
> +get_mic_shutdown_status(struct mic_info *mic, char *shutdown_status)
> +{
> + if (!strcmp(shutdown_status, "nop"))
> + return MIC_NOP;
> + if (!strcmp(shutdown_status, "crashed"))
> + return MIC_CRASHED;
> + if (!strcmp(shutdown_status, "halted"))
> + return MIC_HALTED;
> + if (!strcmp(shutdown_status, "poweroff"))
> + return MIC_POWER_OFF;
> + if (!strcmp(shutdown_status, "restart"))
> + return MIC_RESTART;
> + mpsslog("%s: BUG invalid status %s\n", mic->name, shutdown_status);
> + /* Invalid state */
> + assert(0);
> +};
> +
> +static int get_mic_state(struct mic_info *mic, char *state)
> +{
> + if (!strcmp(state, "offline"))
> + return MIC_OFFLINE;
> + if (!strcmp(state, "online"))
> + return MIC_ONLINE;
> + if (!strcmp(state, "shutting_down"))
> + return MIC_SHUTTING_DOWN;
> + if (!strcmp(state, "reset_failed"))
> + return MIC_RESET_FAILED;
> + mpsslog("%s: BUG invalid state %s\n", mic->name, state);
> + /* Invalid state */
> + assert(0);
> +};
> +
> +static void mic_handle_shutdown(struct mic_info *mic)
> +{
> +#define SHUTDOWN_TIMEOUT 60
> + int i = SHUTDOWN_TIMEOUT, ret, stat = 0;
> + char *shutdown_status;
> + while (i) {
> + shutdown_status = readsysfs(mic->name, "shutdown_status");
> + if (!shutdown_status)
> + continue;
> + mpsslog("%s: %s %d shutdown_status %s\n",
> + mic->name, __func__, __LINE__, shutdown_status);
> + switch (get_mic_shutdown_status(mic, shutdown_status)) {
> + case MIC_RESTART:
> + mic->restart = 1;
> + case MIC_HALTED:
> + case MIC_POWER_OFF:
> + case MIC_CRASHED:
> + goto reset;
> + default:
> + break;
> + }
> + free(shutdown_status);
> + sleep(1);
> + i--;
> + }
> +reset:
> + ret = kill(mic->pid, SIGTERM);
> + mpsslog("%s: %s %d kill pid %d ret %d\n",
> + mic->name, __func__, __LINE__,
> + mic->pid, ret);
> + if (!ret) {
> + ret = waitpid(mic->pid, &stat,
> + WIFSIGNALED(stat));
> + mpsslog("%s: %s %d waitpid ret %d pid %d\n",
> + mic->name, __func__, __LINE__,
> + ret, mic->pid);
> + }
> + if (ret == mic->pid)
> + reset(mic);
> +}
> +
> +static void *
> +mic_config(void *arg)
> +{
> + struct mic_info *mic = (struct mic_info *)arg;
> + char *state = NULL;
> + char pathname[PATH_MAX];
> + int fd, ret;
> + struct pollfd ufds[1];
> + char value[4096];
> +
> + snprintf(pathname, PATH_MAX - 1, "%s/%s/%s",
> + MICSYSFSDIR, mic->name, "state");
> +
> + fd = open(pathname, O_RDONLY);
> + if (fd < 0) {
> + mpsslog("%s: opening file %s failed %s\n",
> + mic->name, pathname, strerror(errno));
> + goto error;
> + }
> +
> + do {
> + ret = read(fd, value, sizeof(value));
> + if (ret < 0) {
> + mpsslog("%s: Failed to read sysfs entry '%s': %s\n",
> + mic->name, pathname, strerror(errno));
> + goto close_error1;
> + }
> +retry:
> + state = readsysfs(mic->name, "state");
> + if (!state)
> + goto retry;
> + mpsslog("%s: %s %d state %s\n",
> + mic->name, __func__, __LINE__, state);
> + switch (get_mic_state(mic, state)) {
> + case MIC_SHUTTING_DOWN:
> + mic_handle_shutdown(mic);
> + goto close_error;
> + default:
> + break;
> + }
> + free(state);
> +
> + ufds[0].fd = fd;
> + ufds[0].events = POLLERR | POLLPRI;
> + ret = poll(ufds, 1, -1);
> + if (ret < 0) {
> + mpsslog("%s: poll failed %s\n",
> + mic->name, strerror(errno));
> + goto close_error1;
> + }
> + } while (1);
> +close_error:
> + free(state);
> +close_error1:
> + close(fd);
> +error:
> + init_mic(mic);
> + pthread_exit(NULL);
> +}
> +
> +static void
> +set_cmdline(struct mic_info *mic)
> +{
> + char buffer[PATH_MAX];
> + int len;
> +
> + len = snprintf(buffer, PATH_MAX,
> + "clocksource=tsc highres=off nohz=off ");
> + len += snprintf(buffer + len, PATH_MAX,
> + "cpufreq_on;corec6_off;pc3_off;pc6_off ");
> + len += snprintf(buffer + len, PATH_MAX,
> + "ifcfg=static;address,172.31.%d.1;netmask,255.255.255.0",
> + mic->id);
> +
> + setsysfs(mic->name, "cmdline", buffer);
> + mpsslog("%s: Command line: \"%s\"\n", mic->name, buffer);
> + snprintf(buffer, PATH_MAX, "172.31.%d.1", mic->id);
> + mpsslog("%s: IPADDR: \"%s\"\n", mic->name, buffer);
> +}
> +
> +static void
> +set_log_buf_info(struct mic_info *mic)
> +{
> + int fd;
> + off_t len;
> + char system_map[] = "/lib/firmware/mic/System.map";
> + char *map, *temp, log_buf[17] = {'\0'};
> +
> + fd = open(system_map, O_RDONLY);
> + if (fd < 0) {
> + mpsslog("%s: Opening System.map failed: %d\n",
> + mic->name, errno);
> + return;
> + }
> + len = lseek(fd, 0, SEEK_END);
> + if (len < 0) {
> + mpsslog("%s: Reading System.map size failed: %d\n",
> + mic->name, errno);
> + close(fd);
> + return;
> + }
> + map = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
> + if (map == MAP_FAILED) {
> + mpsslog("%s: mmap of System.map failed: %d\n",
> + mic->name, errno);
> + close(fd);
> + return;
> + }
> + temp = strstr(map, "__log_buf");
> + if (!temp) {
> + mpsslog("%s: __log_buf not found: %d\n", mic->name, errno);
> + munmap(map, len);
> + close(fd);
> + return;
> + }
> + strncpy(log_buf, temp - 19, 16);
> + setsysfs(mic->name, "log_buf_addr", log_buf);
> + mpsslog("%s: log_buf_addr: %s\n", mic->name, log_buf);
> + temp = strstr(map, "log_buf_len");
> + if (!temp) {
> + mpsslog("%s: log_buf_len not found: %d\n", mic->name, errno);
> + munmap(map, len);
> + close(fd);
> + return;
> + }
> + strncpy(log_buf, temp - 19, 16);
> + setsysfs(mic->name, "log_buf_len", log_buf);
> + mpsslog("%s: log_buf_len: %s\n", mic->name, log_buf);
> + munmap(map, len);
> + close(fd);
> +}
> +
> +static void init_mic(struct mic_info *mic);
> +
> +static void
> +change_virtblk_backend(int x, siginfo_t *siginfo, void *p)
> +{
> + struct mic_info *mic;
> +
> + for (mic = mic_list.next; mic != NULL; mic = mic->next)
> + mic->mic_virtblk.signaled = 1/* true */;
> +}
> +
> +static void
> +init_mic(struct mic_info *mic)
> +{
> + struct sigaction ignore = {
> + .sa_flags = 0,
> + .sa_handler = SIG_IGN
> + };
> + struct sigaction act = {
> + .sa_flags = SA_SIGINFO,
> + .sa_sigaction = change_virtblk_backend,
> + };
> + char buffer[PATH_MAX];
> + int err;
> +
> + /* ignore SIGUSR1 for both process */
> + sigaction(SIGUSR1, &ignore, NULL);
> +
> + mic->pid = fork();
> + switch (mic->pid) {
> + case 0:
> + set_log_buf_info(mic);
> + set_cmdline(mic);
> + add_virtio_device(mic, &virtcons_dev_page.dd);
> + add_virtio_device(mic, &virtnet_dev_page.dd);
> + err = pthread_create(&mic->mic_console.console_thread, NULL,
> + virtio_console, mic);
> + if (err)
> + mpsslog("%s virtcons pthread_create failed %s\n",
> + mic->name, strerror(err));
> + /*
> + * TODO: Debug why not adding this sleep results in the tap
> + * interface not coming up during certain runs sporadically.
> + */

Indeed.

> + usleep(1000);
> + err = pthread_create(&mic->mic_net.net_thread, NULL,
> + virtio_net, mic);
> + if (err)
> + mpsslog("%s virtnet pthread_create failed %s\n",
> + mic->name, strerror(err));
> + err = pthread_create(&mic->mic_virtblk.block_thread, NULL,
> + virtio_block, mic);
> + if (err)
> + mpsslog("%s virtblk pthread_create failed %s\n",
> + mic->name, strerror(err));
> + sigemptyset(&act.sa_mask);
> + err = sigaction(SIGUSR1, &act, NULL);

Confused. Who sends this SIGUSR1 here?


> + if (err)
> + mpsslog("%s sigaction SIGUSR1 failed %s\n",
> + mic->name, strerror(errno));
> + while (1)
> + sleep(60);
> + case -1:
> + mpsslog("fork failed MIC name %s id %d errno %d\n",
> + mic->name, mic->id, errno);
> + break;
> + default:
> + if (mic->restart) {
> + snprintf(buffer, PATH_MAX,
> + "boot:linux:mic/uos.img:mic/mic%d.image",
> + mic->id);
> + setsysfs(mic->name, "state", buffer);
> + mpsslog("%s restarting mic %d\n",
> + mic->name, mic->restart);
> + mic->restart = 0;
> + }
> + pthread_create(&mic->config_thread, NULL, mic_config, mic);
> + }
> +}
> +
> +static void
> +start_daemon(void)
> +{
> + struct mic_info *mic;
> +
> + for (mic = mic_list.next; mic != NULL; mic = mic->next)
> + init_mic(mic);
> +
> + while (1)
> + sleep(60);
> +}
> +
> +static int
> +init_mic_list(void)
> +{
> + struct mic_info *mic = &mic_list;
> + struct dirent *file;
> + DIR *dp;
> + int cnt = 0;
> +
> + dp = opendir(MICSYSFSDIR);
> + if (!dp)
> + return 0;
> +
> + while ((file = readdir(dp)) != NULL) {
> + if (!strncmp(file->d_name, "mic", 3)) {
> + mic->next = malloc(sizeof(struct mic_info));
> + if (mic->next) {
> + mic = mic->next;
> + mic->next = NULL;
> + memset(mic, 0, sizeof(struct mic_info));
> + mic->id = atoi(&file->d_name[3]);
> + mic->name = malloc(strlen(file->d_name) + 16);
> + if (mic->name)
> + strcpy(mic->name, file->d_name);
> + mpsslog("MIC name %s id %d\n", mic->name,
> + mic->id);
> + cnt++;
> + }
> + }
> + }
> +
> + closedir(dp);
> + return cnt;
> +}
> +
> +void
> +mpsslog(char *format, ...)
> +{
> + va_list args;
> + char buffer[4096];
> + time_t t;
> + char *ts;
> +
> + if (logfp == NULL)
> + return;
> +
> + va_start(args, format);
> + vsprintf(buffer, format, args);
> + va_end(args);
> +
> + time(&t);
> + ts = ctime(&t);
> + ts[strlen(ts) - 1] = '\0';
> + fprintf(logfp, "%s: %s", ts, buffer);
> +
> + fflush(logfp);
> +}
> +
> +int
> +main(int argc, char *argv[])
> +{
> + int cnt;
> +
> + myname = argv[0];
> +
> + logfp = fopen(LOGFILE_NAME, "a+");
> + if (!logfp) {
> + fprintf(stderr, "cannot open logfile '%s'\n", LOGFILE_NAME);
> + exit(1);
> + }
> +
> + mpsslog("MIC Daemon start\n");
> +
> + cnt = init_mic_list();
> + if (cnt == 0) {
> + mpsslog("MIC module not loaded\n");
> + exit(2);
> + }
> + mpsslog("MIC found %d devices\n", cnt);
> +
> + start_daemon();
> +
> + exit(0);
> +}
> diff --git a/Documentation/mic/mpssd/mpssd.h b/Documentation/mic/mpssd/mpssd.h
> new file mode 100644
> index 0000000..b6dee38
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpssd.h
> @@ -0,0 +1,100 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +#ifndef _MPSSD_H_
> +#define _MPSSD_H_
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <dirent.h>
> +#include <libgen.h>
> +#include <pthread.h>
> +#include <stdarg.h>
> +#include <time.h>
> +#include <errno.h>
> +#include <sys/dir.h>
> +#include <sys/ioctl.h>
> +#include <sys/poll.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <sys/utsname.h>
> +#include <sys/wait.h>
> +#include <netinet/in.h>
> +#include <arpa/inet.h>
> +#include <netdb.h>
> +#include <pthread.h>
> +#include <signal.h>
> +#include <limits.h>
> +#include <syslog.h>
> +#include <getopt.h>
> +#include <net/if.h>
> +#include <linux/if_tun.h>
> +#include <linux/if_tun.h>
> +#include <linux/virtio_ids.h>
> +
> +#define MICSYSFSDIR "/sys/class/mic"
> +#define LOGFILE_NAME "/var/log/mpssd"
> +#define PAGE_SIZE 4096
> +
> +struct mic_console_info {
> + pthread_t console_thread;
> + int virtio_console_fd;
> + void *console_dp;
> +};
> +
> +struct mic_net_info {
> + pthread_t net_thread;
> + int virtio_net_fd;
> + int tap_fd;
> + void *net_dp;
> +};
> +
> +struct mic_virtblk_info {
> + pthread_t block_thread;
> + int virtio_block_fd;
> + void *block_dp;
> + volatile sig_atomic_t signaled;
> + char *backend_file;
> + int backend;
> + void *backend_addr;
> + long backend_size;
> +};
> +
> +struct mic_info {
> + int id;
> + char *name;
> + pthread_t config_thread;
> + pid_t pid;
> + struct mic_console_info mic_console;
> + struct mic_net_info mic_net;
> + struct mic_virtblk_info mic_virtblk;
> + int restart;
> + struct mic_info *next;
> +};
> +
> +void mpsslog(char *format, ...);
> +char *readsysfs(char *dir, char *entry);
> +int setsysfs(char *dir, char *entry, char *value);
> +#endif
> diff --git a/Documentation/mic/mpssd/sysfs.c b/Documentation/mic/mpssd/sysfs.c
> new file mode 100644
> index 0000000..3244dcf
> --- /dev/null
> +++ b/Documentation/mic/mpssd/sysfs.c
> @@ -0,0 +1,103 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +
> +#include "mpssd.h"
> +
> +#define PAGE_SIZE 4096
> +
> +char *
> +readsysfs(char *dir, char *entry)
> +{
> + char filename[PATH_MAX];
> + char value[PAGE_SIZE];
> + char *string = NULL;
> + int fd;
> + int len;
> +
> + if (dir == NULL)
> + snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> + else
> + snprintf(filename, PATH_MAX,
> + "%s/%s/%s", MICSYSFSDIR, dir, entry);
> +
> + fd = open(filename, O_RDONLY);
> + if (fd < 0) {
> + mpsslog("Failed to open sysfs entry '%s': %s\n",
> + filename, strerror(errno));
> + return NULL;
> + }
> +
> + len = read(fd, value, sizeof(value));
> + if (len < 0) {
> + mpsslog("Failed to read sysfs entry '%s': %s\n",
> + filename, strerror(errno));
> + goto readsys_ret;
> + }
> +
> + value[len] = '\0';

Why are you careful to put this \0 here but not in setsysfs below?

If you do, I'd fail on len == sizeof value as well, it isn't going to work with
that.

> +
> + string = malloc(strlen(value) + 1);
> + if (string)
> + strcpy(string, value);
> +
> +readsys_ret:
> + close(fd);
> + return string;
> +}
> +
> +int
> +setsysfs(char *dir, char *entry, char *value)
> +{
> + char filename[PATH_MAX];
> + char oldvalue[PAGE_SIZE];
> + int fd;
> +
> + if (dir == NULL)
> + snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> + else
> + snprintf(filename, PATH_MAX, "%s/%s/%s",
> + MICSYSFSDIR, dir, entry);
> +
> + fd = open(filename, O_RDWR);
> + if (fd < 0) {
> + mpsslog("Failed to open sysfs entry '%s': %s\n",
> + filename, strerror(errno));
> + return errno;
> + }
> +
> + if (read(fd, oldvalue, sizeof(oldvalue)) < 0) {
> + mpsslog("Failed to read sysfs entry '%s': %s\n",
> + filename, strerror(errno));
> + close(fd);
> + return errno;
> + }
> +
> + if (strcmp(value, oldvalue)) {
> + if (write(fd, value, strlen(value)) < 0) {
> + mpsslog("Failed to write new sysfs entry '%s': %s\n",
> + filename, strerror(errno));
> + close(fd);
> + return errno;
> + }
> + }
> +
> + close(fd);
> + return 0;
> +}
> --
> 1.8.2.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/