Re: Linux 5.3-rc8

From: Theodore Y. Ts'o
Date: Mon Sep 16 2019 - 13:22:29 EST


On Mon, Sep 16, 2019 at 09:17:10AM -0700, Linus Torvalds wrote:
> So the semantics that getrandom() should have had are:
>
> getrandom(0) - just give me reasonable random numbers for any of a
> million non-strict-long-term-security use (ie the old urandom)
>
> - the nonblocking flag makes no sense here and would be a no-op

That change is what I consider highly problematic. There are a *huge*
number of applications which use cryptography which assumes that
getrandom(0) means, "I'm guaranteed to get something safe
cryptographic use". Changing his now would expose a very large number
of applications to be insecure. Part of the problem here is that
there are many different actors. There is the application or
cryptographic library developer, who may want to be sure they have
cryptographically secure random numbers. They are the ones who will
select getrandom(0).

Then you have the distribution or consumer-grade electronics
developers who may choose to run them too early in some init script or
systemd unit files. And some of these people may do something stupid,
like run things too early, or omit the a hardware random number
generator in their design, even though it's for a security critical
purpose (say, a digital wallet for bitcoin). Because some of these
people might do something stupid, one argument (not mine) is that we
must therefore not let getrandom() block. But doing this penalizes
the security of all the users of the application, not just the stupid
ones.

> getrandom(GRND_RANDOM) - get me actual _secure_ random numbers with
> blocking until entropy pool fills (but not the completely invalid
> entropy decrease accounting)
>
> - the nonblocking flag is useful for bootup and for "I will
> actually try to generate entropy".
>
> and both of those are very very sensible actions. That would actually
> have _fixed_ the problems we had with /dev/[u]random, both from a
> performance standpoint and for a filesystem access standpoint.
>
> But that is sadly not what we have right now.
>
> And I suspect we can't fix it, since people have grown to depend on
> the old behavior, and already know to avoid GRND_RANDOM because it's
> useless with old kernels even if we fixed it with new ones.

I don't think we can fix it, because it's the changing of
getrandom(0)'s behavior which is the problem, not GRND_RANDOM. People
*expect* getrandom(0) to always return secure results. I don't think
we can make it sometimes return not-necessarily secure results
depending on when the systems integrator or distribution decides to
run the application, and depending on the hardware platform (yes,
traditional x86 systems are probably fine, and fortunately x86
embedded CPU are too expensive and have lousy power management, so no
one really uses x86 for embedded yet, despite Intel's best efforts).
That would just be a purely irresponsible thing to do, IMO.

> Does anybody really seriously debate the above? Ted? Are you seriously
> trying to claim that the existing GRND_RANDOM has any sensible use?
> Are you seriously trying to claim that the fact that we don't have a
> sane urandom source is a "feature"?

There are people who can debate that GRND_RANDOM has any sensible use
cases. GPG uses /dev/random, and that was a fully informed choice.
I'm not convinced, because I think that at least for now the CRNG is
perfectly fine for 99.999% of the use cases. Yes, in a post-quantum
cryptography world, the CRNG might be screwed --- but so will most of
the other cryptographic algorithms in the kernel. So if anyone ever
gets post-quantum cryptoanalytic attacks working, the use of the CRNG
is going to be least of our problems.

As I mentioned to you in Lisbon, I've been going back and forth about
whether or not to rip out the entire /dev/random infrastructure,
mainly for code maintainability reasons. The only reason why I've
been holding back is because there are (very few) non-insane people
who do want to use it. There are also a much larger of rational
people who use it because they want some insane PCI compliance labs to
go away. What I suspect most of them are actually doing in practice
is they use /dev/random, but they also use a hardware random number
generator so /dev/random never actually blocks in practice. The use
of /dev/random is enough to make the PCI compliance lab go away, and
the hardware random number generator (or virtio-rng on a VM) makes
/dev/random useable.

But I don't think we can reuse GRND_RANDOM for that reason.

We could create a new flag, GRND_INSECURE, which never blocks. And
that that allows us to solve the problem for silly applications that
are using getrandom(2) for non-cryptographic use cases. Use cases
might include Python dictionary seeds, gdm for MIT Magic Cookie, UUID
generation where best efforts probably is good enough, etc. The
answer today is they should just use /dev/urandom, since that exists
today, and we have to support it for backwards compatibility anyway.
It sounds like gdm recently switched to getrandom(2), and I suspect
that it's going to get caught on some hardware configs anyway, even
without the ext4 optimization patch. So I suspect gdm will switch
back to /dev/urandom, and this particular pain point will probably go
away.

- Ted