Re: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

From: Kees Cook
Date: Tue Apr 30 2019 - 14:01:46 EST


On Tue, Apr 30, 2019 at 10:51 AM Reshetova, Elena
<elena.reshetova@xxxxxxxxx> wrote:
> base: Simple syscall: 0.1761 microseconds
> get_random_bytes (4096 bytes per-cpu buffer): 0.1793 microsecons
> get_random_bytes (64 bytes per-cpu buffer): 0.1866 microsecons

The 4096 size seems pretty good.

> Below is a snip of what I quickly did (relevant parts) to get these numbers.
> I do initial population of per-cpu buffers in late_initcall, but
> practice shows that rng might not always be in good state by then.
> So, we might not have really good randomness then, but I am not sure
> if this is a practical problem since it only applies to system boot and by
> the time it booted, it already issued enough syscalls that buffer gets refilled
> with really good numbers.
> Alternatively we can also do it on the first syscall that each cpu gets, but I
> am not sure if that is always guaranteed to have a good randomness.

Populating at first syscall seems like a reasonable way to delay. And
I agree: I think we should not be too concerned about early RNG state:
we should design for the "after boot" behaviors.

> diff --git a/lib/percpu-random.c b/lib/percpu-random.c
> new file mode 100644
> index 000000000000..3f92c44fbc1a
> --- /dev/null
> +++ b/lib/percpu-random.c
> @@ -0,0 +1,49 @@
> +#include <linux/types.h>
> +#include <linux/percpu.h>
> +#include <linux/random.h>
> +
> +static DEFINE_PER_CPU(struct rnd_buffer, stack_rand_offset) __latent_entropy;
> +
> +
> +/*
> + * Generate some initially weak seeding values to allow
> + * to start the prandom_u32() engine.
> + */
> +static int __init stack_rand_offset_init(void)
> +{
> + int i;
> +
> + /* exctract bits to out per-cpu rand buffers */
> + for_each_possible_cpu(i) {
> + struct rnd_buffer *buffer = &per_cpu(stack_rand_offset, i);
> + buffer->byte_counter = 0;
> + /* if rng is not initialized, this won't extract us good stuff
> + * but we cannot wait for rng to initialize either */
> + get_random_bytes(&(buffer->buffer), sizeof(buffer->buffer));

Instead of doing get_random_bytes() here, just set byte_counter =
RANDOM_BUFFER_SIZE and let random_get_byte() do the work on a per-cpu
basis?

> +
> + }
> +
> + return 0;
> +}
> +late_initcall(stack_rand_offset_init);
> +
> +unsigned char random_get_byte(void)
> +{
> + struct rnd_buffer *buffer = &get_cpu_var(stack_rand_offset);
> + unsigned char res;
> +
> + if (buffer->byte_counter >= RANDOM_BUFFER_SIZE) {
> + get_random_bytes(&(buffer->buffer), sizeof(buffer->buffer));
> + buffer->byte_counter = 0;
> + }
> +
> + res = buffer->buffer[buffer->byte_counter];
> + buffer->buffer[buffer->byte_counter] = 0;
> + buffer->byte_counter ++;
> + put_cpu_var(stack_rand_offset);
> + return res;
> +}
> +EXPORT_SYMBOL(random_get_byte);

Otherwise, sure, looks good. I remain worried about info leaks of the
percpu area causing pain down the road, but we find a safer way to do
this, we can do it later.

--
Kees Cook