Re: [PATCH 6/7] qemu: Implement virtio-pstore device

From: Namhyung Kim
Date: Thu Jul 28 2016 - 01:40:31 EST


On Thu, Jul 28, 2016 at 03:02:54AM +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 28, 2016 at 12:08:30AM +0900, Namhyung Kim wrote:
> > Add virtio pstore device to allow kernel log files saved on the host.
> > It will save the log files on the directory given by pstore device
> > option.
> >
> > $ qemu-system-x86_64 -device virtio-pstore,directory=dir-xx ...
> >
> > (guest) # echo c > /proc/sysrq-trigger
>
> So if the point is handling system crashes, I suspect
> virtio is the wrong protocol to use. ATM it's rather
> elaborate for performance, it's likely not to work when
> linux is crashing. I think you want something very very
> simple that will still work when you crash. Like maybe
> a serial device?

Well, I dont' know. As you know, the kernel oops dump is already sent
to serial device but it's rather slow. As I wrote in the cover
letter, enabling ftrace_dump_on_oops makes it even worse.. Also
pstore saves the (compressed) binary data so I thought it'd be better
to have a dedicated IO channel.

I know that we can't rely on anything in kernel when the kernel is
crashing. In the virtualization environment, I think it'd be great if
it can dump the crash data in the host directly so I chose the virtio.
I thought the virtio is a relatively thin layer and independent from
other kernel features. Even if it doesn't guarantee to work 100%,
it's worth trying IMHO.


>
> > $ ls dir-xx
> > dmesg-1.enc.z dmesg-2.enc.z
> >
> > The log files are usually compressed using zlib. Users can see the log
> > messages directly on the host or on the guest (using pstore filesystem).
>
> So this lacks all management tools that a regular storage device
> has, and these are necessary if people are to use this
> in production.
>
> For example, some kind of provision for limiting the amount of host disk
> this can consume seems called for. Rate-limiting disk writes
> on host also seems necessary. Handling host disk errors always
> propagates error to guest but an option to e.g. stop vm might
> be useful to avoid corruption.

Yes, it needs that kind of management. I'd like to look at the
existing implementation. But basically I thought it as a debugging
feature and it won't produce much data for the default setting. Maybe
I can limit the file size not to exceed the buffer size and keep up to
pre-configured number of files for each type. Now host disk error
will be propagated to guest.


>
> >
> > The 'directory' property is required for virtio-pstore device to work.
> > It also adds 'bufsize' and 'console' (boolean) properties.
>
> No idea what these do. Seem to be RW by guest but have no effect
> otherwise.

The 'directory' is a pathname which it saves the data files. The
'bufsize' sets the size of buffer used by pstore. The 'console'
enables saving pstore console type data.


>
>
> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
> > Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx>
> > Cc: Anthony Liguori <aliguori@xxxxxxxxxx>
> > Cc: Anton Vorontsov <anton@xxxxxxxxxx>
> > Cc: Colin Cross <ccross@xxxxxxxxxxx>
> > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> > Cc: Tony Luck <tony.luck@xxxxxxxxx>
> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: Minchan Kim <minchan@xxxxxxxxxx>
> > Cc: kvm@xxxxxxxxxxxxxxx
> > Cc: qemu-devel@xxxxxxxxxx
> > Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> > ---

[SNIP]
> > diff --git a/hw/virtio/virtio-pstore.c b/hw/virtio/virtio-pstore.c
> > new file mode 100644
> > index 0000000..2ca7786
> > --- /dev/null
> > +++ b/hw/virtio/virtio-pstore.c
> > @@ -0,0 +1,477 @@
> > +/*
> > + * Virtio Pstore Device
> > + *
> > + * Copyright (C) 2016 LG Electronics
> > + *
> > + * Authors:
> > + * Namhyung Kim <namhyung@xxxxxxxxx>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
>
> We generally ask new code to be v2 or later.

Ok, will update.


>
> > See
> > + * the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include <stdio.h>
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/iov.h"
> > +#include "qemu-common.h"
> > +#include "qemu/cutils.h"
> > +#include "qemu/error-report.h"
> > +#include "sysemu/kvm.h"
> > +#include "qapi/visitor.h"
> > +#include "qapi-event.h"
> > +#include "trace.h"
> > +
> > +#include "hw/virtio/virtio.h"
> > +#include "hw/virtio/virtio-bus.h"
> > +#include "hw/virtio/virtio-access.h"
> > +#include "hw/virtio/virtio-pstore.h"
> > +
> > +
> > +static void virtio_pstore_to_filename(VirtIOPstore *s, char *buf, size_t sz,
> > + struct virtio_pstore_req *req)
> > +{
> > + const char *basename;
> > + unsigned long long id = 0;
> > + unsigned int flags = le32_to_cpu(req->flags);
> > +
> > + switch (le16_to_cpu(req->type)) {
> > + case VIRTIO_PSTORE_TYPE_DMESG:
> > + basename = "dmesg";
> > + id = s->id++;
> > + break;
> > + case VIRTIO_PSTORE_TYPE_CONSOLE:
> > + basename = "console";
> > + if (s->console_id) {
> > + id = s->console_id;
> > + } else {
> > + id = s->console_id = s->id++;
> > + }
> > + break;
> > + default:
> > + basename = "unknown";
> > + break;
> > + }
> > +
> > + snprintf(buf, sz, "%s/%s-%llu%s", s->directory, basename, id,
> > + flags & VIRTIO_PSTORE_FL_COMPRESSED ? ".enc.z" : "");
> > +}
> > +
> > +static void virtio_pstore_from_filename(VirtIOPstore *s, char *name,
> > + char *buf, size_t sz,
> > + struct virtio_pstore_fileinfo *info)
> > +{
> > + snprintf(buf, sz, "%s/%s", s->directory, name);
>
> if this does not fit, buf will not be 0 terminated and can
> generally be corrupted.

Will change it to use malloc instead.

>
>
> > +
> > + if (g_str_has_prefix(name, "dmesg-")) {
> > + info->type = VIRTIO_PSTORE_TYPE_DMESG;
> > + name += strlen("dmesg-");
> > + } else if (g_str_has_prefix(name, "console-")) {
> > + info->type = VIRTIO_PSTORE_TYPE_CONSOLE;
> > + name += strlen("console-");
> > + } else if (g_str_has_prefix(name, "unknown-")) {
> > + info->type = VIRTIO_PSTORE_TYPE_UNKNOWN;
> > + name += strlen("unknown-");
> > + }
> > +
> > + qemu_strtoull(name, NULL, 0, &info->id);
> > +
> > + info->flags = 0;
> > + if (g_str_has_suffix(name, ".enc.z")) {
> > + info->flags |= VIRTIO_PSTORE_FL_COMPRESSED;
> > + }
> > +}

[SNIP]
> > +static ssize_t virtio_pstore_do_write(VirtIOPstore *s, struct iovec *out_sg,
> > + unsigned int out_num,
> > + struct virtio_pstore_req *req)
> > +{
> > + char path[PATH_MAX];
> > + int fd;
> > + ssize_t len;
> > + unsigned short type;
> > + int flags = O_WRONLY | O_CREAT;
> > +
> > + /* we already consume the req */
> > + iov_discard_front(&out_sg, &out_num, sizeof(*req));
> > +
> > + virtio_pstore_to_filename(s, path, sizeof(path), req);
> > +
> > + type = le16_to_cpu(req->type);
> > +
> > + if (type == VIRTIO_PSTORE_TYPE_DMESG) {
> > + flags |= O_TRUNC;
> > + } else if (type == VIRTIO_PSTORE_TYPE_CONSOLE) {
> > + flags |= O_APPEND;
> > + }
> > +
> > + fd = open(path, flags, 0644);
> > + if (fd < 0) {
> > + error_report("cannot open %s", path);
> > + return -1;
> > + }
> > + len = writev(fd, out_sg, out_num);
> > + close(fd);
> > +
> > + return len;
>
> All this is blocking VM until host io completes.

Hmm.. I don't know about the internals of qemu. So does it make guest
stop? If so, that's what I want to do for _DMESG. :) As it's called
only on kernel oops I think it's admittable. But for _CONSOLE, it
needs to do asynchronously. Maybe I can add a thread to do the work.


>
>
> > +}
> > +
> > +static ssize_t virtio_pstore_do_close(VirtIOPstore *s)
> > +{
> > + if (s->dirp == NULL) {
> > + return 0;
> > + }
> > +
> > + closedir(s->dirp);
> > + s->dirp = NULL;
> > +
> > + return 0;
> > +}
> > +
> > +static ssize_t virtio_pstore_do_erase(VirtIOPstore *s,
> > + struct virtio_pstore_req *req)
> > +{
> > + char path[PATH_MAX];
> > +
> > + virtio_pstore_to_filename(s, path, sizeof(path), req);
> > +
> > + return unlink(path);
> > +}
> > +
> > +static void virtio_pstore_handle_io(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > + VirtIOPstore *s = VIRTIO_PSTORE(vdev);
> > + VirtQueueElement *elem;
> > + struct virtio_pstore_req req;
> > + struct virtio_pstore_res res;
> > + ssize_t len = 0;
> > + int ret;
> > +
> > + for (;;) {
> > + elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> > + if (!elem) {
> > + return;
> > + }
> > +
> > + if (elem->out_num < 1 || elem->in_num < 1) {
> > + error_report("request or response buffer is missing");
> > + exit(1);
> > + }
> > +
> > + len = iov_to_buf(elem->out_sg, elem->out_num, 0, &req, sizeof(req));
> > + if (len != (ssize_t)sizeof(req)) {
> > + error_report("invalid request size: %ld", (long)len);
> > + exit(1);
> > + }
> > + res.cmd = req.cmd;
> > + res.type = req.type;
> > +
> > + switch (le16_to_cpu(req.cmd)) {
> > + case VIRTIO_PSTORE_CMD_OPEN:
> > + ret = virtio_pstore_do_open(s);
> > + break;
> > + case VIRTIO_PSTORE_CMD_READ:
> > + ret = virtio_pstore_do_read(s, elem->in_sg, elem->in_num, &res);
> > + if (ret > 0) {
> > + len = ret;
> > + ret = 0;
> > + }
> > + break;
> > + case VIRTIO_PSTORE_CMD_WRITE:
> > + ret = virtio_pstore_do_write(s, elem->out_sg, elem->out_num, &req);
> > + break;
> > + case VIRTIO_PSTORE_CMD_CLOSE:
> > + ret = virtio_pstore_do_close(s);
> > + break;
> > + case VIRTIO_PSTORE_CMD_ERASE:
> > + ret = virtio_pstore_do_erase(s, &req);
> > + break;
> > + default:
> > + ret = -1;
> > + break;
> > + }
> > +
> > + res.ret = ret;
> > +
> > + iov_from_buf(elem->in_sg, elem->in_num, 0, &res, sizeof(res));
> > + virtqueue_push(vq, elem, sizeof(res) + len);
>
> this is wrong - len should be # of bytes written into guest memory.

???

All command except READ only write the virtio_pstore_ret so len is 0.
For READ, virtio_pstore_do_read() returns the actual bytes it wrote
(minus sizeof(res) of course), so I think it's fine, no?

Anyway, thanks for your review!

Thanks,
Namhyung


>
> > +
> > + virtio_notify(vdev, vq);
> > + g_free(elem);
> > +
> > + if (ret < 0) {
> > + return;
> > + }
> > + }
> > +}