[PATCH 0/2] [RFC] virtio-rng entropy leak reporting feature

From: Babis Chalios
Date: Thu Jan 19 2023 - 13:44:38 EST


Recently, a proposal has been published [1] for a new feature in the
VirtIO RNG device which will allows the device to report "entropy leaks"
to the guest VM. Such an event occurs when, for example, we take a VM
snapshot, or when we restore a VM from a snapshot.

The feature allows the guest to request for certain operations to be
performed upon an entropy leak event. When such an event occurs, the
device will handle the requests and add the request buffers to the used
queue. Adding these buffers to the used queue operates as a notification
towards the guest about the entropy leak event.

The proposed changes describe two types of requests that can be
performed: (1) fill a buffer in guest memory with random bytes and (2)
perform a memory copy between two buffers in guest memory.

The mechanism provides similar functionality to Microsoft's Virtual
Machine Generation ID and it can be used to re-seed the kernel's PRNG
upon taking a VM snapshot or resuming from one. Additionally, it allows
to (1) avoid the race-condition that exists with our VMGENID
implementation, between the time a VM is resumed after a "leak event"
and the handling of the ACPI notification before adding the new entropy.
Finally, it allows building on top of it to provide a mechanism for
notifying user-space about such events.

The first patch of this series, extends the current virtio-rng driver to
implement the new feature and ensures that there is always a request to
get some random bytes from the device in the event of an entropy leak
and uses these bytes as entropy through the `add_device_randomness`.

The second patch adds a copy-on-leak command as well in the queue,
implementating the idea of a generation counter that has previously been
part of the VMGENID saga. It then exposes the value of the generation
counter over a sysfs file. User-space can read, mmap and poll on the
file in order to be notified about entropy leak events.

I have performed basic tests of the user-space interfaces using a
Firecracker where I implemented virtio-rng with the proposed features.
Instructions on how to replicate this can be found here:
https://github.com/bchalios/virtio-snapsafe-example

The patchset does not solve all problems. We do not define an API for
other parts of the kernel to be able to use directly the new
functionality (add commands to the queue), mainly because I 'm not sure
what would the correct API be. I was toying with the idea of extending
`struct hwrng` with two new hooks that would be implemented only by
virtio-rng but I'm not sure I like it, so I am open to suggestions.

As a result of the above, the way we use the functionality to add new
entropy, i.e. calling `add_device_randomness`, is as racy as the VMGENID
case, since it relies on used buffers been handled by the virtio driver.

As for user-space, the `mmap` interface *is* race-free. Changes in the
generation counter will be observable by user applications the moment VM
vcpus resume. However, the `poll` interface isn't, `sysfs_notify` is
being called as well when the virtio driver handles used buffers. I am
not sure I have a solution for this last one.

Posting this, I hope we can resume the discussion about solving the
above issues (or any other issue that I haven't thought of), especially
with regards to providing a mechanism suitable for user-space
notifications.

Cheers,
Babis

Babis Chalios (2):
virtio-rng: implement entropy leak feature
virtio-rng: add sysfs entries for leak detection

drivers/char/hw_random/virtio-rng.c | 360 +++++++++++++++++++++++++++-
include/uapi/linux/virtio_rng.h | 3 +
2 files changed, 357 insertions(+), 6 deletions(-)

--
2.38.1

Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936