[RFC PATCH 0/4] random: a simple vDSO mechanism for reseeding userspace CSPRNGs

From: Yann Droneaud
Date: Thu Jan 12 2023 - 12:45:06 EST


Hi,

Here's my humble hack at improving kernel for a faster secure arc4random()
userspace implementation, by allowing userspace to buffer getrandom()
generated entropy, discarding it as the kernel's own CSPRNG is reseeded.

It's largely built upon the vDSO work of Jason A. Donenfeld, as part of
its latest patchset "[PATCH v14 0/7] implement getrandom() in vDSO" [1]
but it's made simpler by making available only one of the missing tools
for the userspace to properly buffer the output of getrandom().

Using MADV_WIPEONFORK and mlock(), userspace can reasonably offer forward
secrecy*, until something like VM_DROPPABLE[2] is provided by the kernel,
to allow for the buffer memory to never, ever be written to the disk
before its used, being inherited accross fork(), and isn't limited by
RLIMIT_MEMLOCK.

* provided userspace can mlock() the memory and calls mlock() on buffer
after fork, as memory locks are not inherited accross fork().

As it's a hack, it's far from perfect. The main drawback I see is the
case where fresh entropy has to be discarded as the kernel's CSPRNG
generation is updated as the result of calling getrandom() to generate
the mentionned entropy. The workaround, is to limit the amount of fresh
entropy fetched when a kernel's CSPRNG generation change is detected,
and to increase the amount the data retrieved with getrandom() when
generation doesn't change between calls.

Performance wise, the improvements are here, as one can check with the
test program provided:

getrandom(,,GRND_TIMESTAMP) test
getrandom() support GRND_TIMESTAMP
found getrandom() in vDSO at 0x7ffc3efccc60
== direct syscall getrandom(), 16777216 u32, 2.866324020 s, 5.853 M u32/s, 170.846 ns/u32
== direct vDSO getrandom(), 16777216 u32, 2.883473280 s, 5.818 M u32/s, 171.868 ns/u32
== pooled syscall getrandom(), 16777216 u32, 1.152421219 s, 14.558 M u32/s, 68.690 ns/u32, (0 bytes discarded)
== pooled vDSO getrandom(), 16777216 u32, 0.162477863 s, 103.258 M u32/s, 9.684 ns/u32, (0 bytes discarded)

With the requirement to mlock() the memory page(s) used to buffer
getrandom() output, I'm not sure userspace could afford to allocate
4KBytes per thread, before being hit by RLIMIT_MEMLOCK (or worse,
OOM killer). Thus, some form of sharing between threads would be
needed, which would require locking, reducing the performances
shown above.

Also I haven't studied the security impact of making the kernel base
CSPRNG seed generation available to userspace. It can be made more
opaque if needed.

Regards.

[1] https://lore.kernel.org/all/20230101162910.710293-1-Jason@xxxxxxxxx/
[2] https://lore.kernel.org/all/20230101162910.710293-3-Jason@xxxxxxxxx/

Jason A. Donenfeld (2):
random: introduce generic vDSO getrandom(,, GRND_TIMESTAMP) fast path
x86: vdso: Wire up getrandom() vDSO implementation.

Yann Droneaud (2):
random: introduce getrandom() GRND_TIMESTAMP
testing: add a getrandom() GRND_TIMESTAMP vDSO demonstration/benchmark

MAINTAINERS | 1 +
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/Makefile | 3 +-
arch/x86/entry/vdso/vdso.lds.S | 2 +
arch/x86/entry/vdso/vgetrandom.c | 17 +
arch/x86/include/asm/vdso/getrandom.h | 42 +++
arch/x86/include/asm/vdso/vsyscall.h | 2 +
arch/x86/include/asm/vvar.h | 16 +
drivers/char/random.c | 52 ++-
include/linux/random.h | 31 ++
include/uapi/linux/random.h | 2 +
include/vdso/datapage.h | 9 +
lib/vdso/Kconfig | 5 +
lib/vdso/getrandom.c | 51 +++
tools/testing/crypto/getrandom/Makefile | 4 +
.../testing/crypto/getrandom/test-getrandom.c | 307 ++++++++++++++++++
16 files changed, 543 insertions(+), 2 deletions(-)
create mode 100644 arch/x86/entry/vdso/vgetrandom.c
create mode 100644 arch/x86/include/asm/vdso/getrandom.h
create mode 100644 lib/vdso/getrandom.c
create mode 100644 tools/testing/crypto/getrandom/Makefile
create mode 100644 tools/testing/crypto/getrandom/test-getrandom.c

--
2.37.2