Re: Linux 5.3-rc8

From: Theodore Y. Ts'o
Date: Sat Sep 14 2019 - 22:06:08 EST


On Sat, Sep 14, 2019 at 06:10:47PM -0700, Linus Torvalds wrote:
> > We could return 0 for success, and yet "the best we
> > can do" could be really terrible.
>
> Yes. Which is why we should warn.

I'm all in favor of warning. But people might just ignore the
warning. We warn today about systemd trying to read from /dev/urandom
too early, and that just gets ignored.

> But we can't *block*. Because that just breaks people. Like shown in
> this whole discussion.

I'd be willing to let it take at least 2 minutes, since that's slow
enough to be annoying. I'd be willing to to kill the process which
tried to call getrandom too early. But I believe blocking is better
than returning something potentially not random at all. I think
failing "safe" is extremely important. And returning something not
random which then gets used for a long-term private key is a disaster.

You basically want to turn getrandom into /dev/urandom. And that's
how we got into the mess where 10% of the publically accessible ssh
keys could be guessed. I've tried that already, and we saw how that
ended.

> Why is warning different? Because hopefully it tells the only person
> who can *do* something about it - the original maintainer or developer
> of the user space tools - that they are doing something wrong and need
> to fix their broken model.

Except the developer could (and *has) just ignored the warning, which
is what happened with /dev/urandom when it was accessed too early.
Even when I drew some developers attention to the warning, at least
one just said, "meh", and blew me off. Would a making it be noiser
(e.g., a WARN_ON) make enough of a difference? I guess I'm just not
convinced.

> Blocking doesn't do that. Blocking only makes the system unusable. And
> yes, some security people think "unusable == secure", but honestly,
> those security people shouldn't do system design. They are the worst
> kind of "technically correct" incompetent.

Which is worse really depends on your point of view, and what the
system might be controlling. If access to the system could cause a
malicious attacker to trigger a nuclear bomb, failing safe is always
going to be better. In other cases, maybe failing open is certainly
more convenient. It certainly leaves the system more "usable". But
how do we trade off "usable" with "insecure"? There are times when
"unusable" is WAY better than "could risk life or human safety".

Would you be willing to settle for a CONFIG option or a boot-command
line option which controls whether we fail "safe" or fail "open" if
someone calls getrandom(2) and there isn't enough entropy? Then each
distribution and/or system integrator can decide whether "proper
systems design" considers "usability" versus "must not fail
insecurely" to be more important.

- Ted