Re: Linux 5.3-rc8

From: Alexander E. Patrakov
Date: Mon Sep 16 2019 - 14:00:38 EST


16.09.2019 22:21, Theodore Y. Ts'o ÐÐÑÐÑ:
On Mon, Sep 16, 2019 at 09:17:10AM -0700, Linus Torvalds wrote:
So the semantics that getrandom() should have had are:

getrandom(0) - just give me reasonable random numbers for any of a
million non-strict-long-term-security use (ie the old urandom)

- the nonblocking flag makes no sense here and would be a no-op

That change is what I consider highly problematic. There are a *huge*
number of applications which use cryptography which assumes that
getrandom(0) means, "I'm guaranteed to get something safe
cryptographic use". Changing his now would expose a very large number
of applications to be insecure. Part of the problem here is that
there are many different actors. There is the application or
cryptographic library developer, who may want to be sure they have
cryptographically secure random numbers. They are the ones who will
select getrandom(0).

Then you have the distribution or consumer-grade electronics
developers who may choose to run them too early in some init script or
systemd unit files. And some of these people may do something stupid,
like run things too early, or omit the a hardware random number
generator in their design, even though it's for a security critical
purpose (say, a digital wallet for bitcoin). Because some of these
people might do something stupid, one argument (not mine) is that we
must therefore not let getrandom() block. But doing this penalizes
the security of all the users of the application, not just the stupid
ones.

On Linux, there is no such thing as "too early", that's the problem.

First, we already had one lesson about this, regarding applications that require libraries from /usr. There, it was due to various programs that run from udev rules, and dynamic/unpredictable dependencies. See https://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken/, almost all arguments from there apply 1:1 here.

Second, people/distributions put unexpected stuff into their initramfs images, and we cannot say that they have no right to do so. E.g., on my system that's "cryptsetup" that unlocks the root partition, but manages to read a few bytes of uninitialized urandom before that. A warning here is almost unavoidable, and thus will be treated as SPAM.

No such considerations apply to OpenBSD (initramfs does not exist, and there is no equivalent of udev that reacts to cold-plug events by running programs), that's why the getentropy() design works there.

If we were to fix it, we should focus on making true entropy available unconditionally, even before /init in the initramfs starts, and warn not on the first access to urandom, but on the exec of /init. Look - distributions are already running "haveged" which harvests entropy from clock jitter. And they still manage to do it wrong (regardless whether the "haveged" idea is wrong by itself), by running it too late (at least I don't know any kind of stock initramfs with either it or rngd included). So it's too complex, and needs to be simplified.

The kernel already has jitterentropy-rng, it uses the same idea as "haveged", but, alas, it is exposed as a crypto rng algorithm, not a hwrng. And I think it is a bug: cryptoapi rng algorithms are for things that get a seed and generate random numbers by rehashing it over and over, while jitterentropy-rng requires no seed. Would a patch be accepted to convert it to hwrng? (this is essentially the reverse of what commit c46ea13 did for exynos-rng)


getrandom(GRND_RANDOM) - get me actual _secure_ random numbers with
blocking until entropy pool fills (but not the completely invalid
entropy decrease accounting)

- the nonblocking flag is useful for bootup and for "I will
actually try to generate entropy".

and both of those are very very sensible actions. That would actually
have _fixed_ the problems we had with /dev/[u]random, both from a
performance standpoint and for a filesystem access standpoint.

But that is sadly not what we have right now.

And I suspect we can't fix it, since people have grown to depend on
the old behavior, and already know to avoid GRND_RANDOM because it's
useless with old kernels even if we fixed it with new ones.

I don't think we can fix it, because it's the changing of
getrandom(0)'s behavior which is the problem, not GRND_RANDOM. People
*expect* getrandom(0) to always return secure results. I don't think
we can make it sometimes return not-necessarily secure results
depending on when the systems integrator or distribution decides to
run the application, and depending on the hardware platform (yes,
traditional x86 systems are probably fine, and fortunately x86
embedded CPU are too expensive and have lousy power management, so no
one really uses x86 for embedded yet, despite Intel's best efforts).
That would just be a purely irresponsible thing to do, IMO.

Does anybody really seriously debate the above? Ted? Are you seriously
trying to claim that the existing GRND_RANDOM has any sensible use?
Are you seriously trying to claim that the fact that we don't have a
sane urandom source is a "feature"?

There are people who can debate that GRND_RANDOM has any sensible use
cases. GPG uses /dev/random, and that was a fully informed choice.
I'm not convinced, because I think that at least for now the CRNG is
perfectly fine for 99.999% of the use cases. Yes, in a post-quantum
cryptography world, the CRNG might be screwed --- but so will most of
the other cryptographic algorithms in the kernel. So if anyone ever
gets post-quantum cryptoanalytic attacks working, the use of the CRNG
is going to be least of our problems.

As I mentioned to you in Lisbon, I've been going back and forth about
whether or not to rip out the entire /dev/random infrastructure,
mainly for code maintainability reasons. The only reason why I've
been holding back is because there are (very few) non-insane people
who do want to use it. There are also a much larger of rational
people who use it because they want some insane PCI compliance labs to
go away. What I suspect most of them are actually doing in practice
is they use /dev/random, but they also use a hardware random number
generator so /dev/random never actually blocks in practice. The use
of /dev/random is enough to make the PCI compliance lab go away, and
the hardware random number generator (or virtio-rng on a VM) makes
/dev/random useable.

Please don't forget about people who run Linux on Hyper-V, not on KVM, and thus have no access to virtio-rng ;)


But I don't think we can reuse GRND_RANDOM for that reason.

We could create a new flag, GRND_INSECURE, which never blocks. And
that that allows us to solve the problem for silly applications that
are using getrandom(2) for non-cryptographic use cases. Use cases
might include Python dictionary seeds, gdm for MIT Magic Cookie, UUID
generation where best efforts probably is good enough, etc. The
answer today is they should just use /dev/urandom, since that exists
today, and we have to support it for backwards compatibility anyway.
It sounds like gdm recently switched to getrandom(2), and I suspect
that it's going to get caught on some hardware configs anyway, even
without the ext4 optimization patch. So I suspect gdm will switch
back to /dev/urandom, and this particular pain point will probably go
away.

- Ted


Well, at this point, I see that there is a lot of disagreement about how getrandom() should behave, aggravated by the baggage of existing applications and libraries with contradictory requirements regarding getrandom(0) (so not really solvable). I am almost convinced that we might want to return -ENOSYS unconditionally, and create a different system call with sane flags.

--
Alexander E. Patrakov

Attachment: smime.p7s
Description: Криптографическая подпись S/MIME