Re: [PATCH v5 0/7] treewide cleanup of random integer usage

From: Yury Norov
Date: Sat Oct 08 2022 - 17:42:44 EST


On Fri, Oct 07, 2022 at 11:53:52PM -0600, Jason A. Donenfeld wrote:
> Changes v4->v5:
> - Coccinelle is now used for as much mechanical aspects as possible,
> with mechanical parts split off from non-mechanical parts. This should
> drastically reduce the amount of code that needs to be reviewed
> carefully. Each commit mentions now if it was done by hand or is
> mechanical.
>
> Hi folks,
>
> This is a five part treewide cleanup of random integer handling. The
> rules for random integers are:
>
> - If you want a secure or an insecure random u64, use get_random_u64().
> - If you want a secure or an insecure random u32, use get_random_u32().
> * The old function prandom_u32() has been deprecated for a while now
> and is just a wrapper around get_random_u32(). Same for
> get_random_int().
> - If you want a secure or an insecure random u16, use get_random_u16().
> - If you want a secure or an insecure random u8, use get_random_u8().
> - If you want secure or insecure random bytes, use get_random_bytes().
> * The old function prandom_bytes() has been deprecated for a while now
> and has long been a wrapper around get_random_bytes().
> - If you want a non-uniform random u32, u16, or u8 bounded by a certain
> open interval maximum, use prandom_u32_max().
> * I say "non-uniform", because it doesn't do any rejection sampling or
> divisions. Hence, it stays within the prandom_* namespace.
>
> These rules ought to be applied uniformly, so that we can clean up the
> deprecated functions, and earn the benefits of using the modern
> functions. In particular, in addition to the boring substitutions, this
> patchset accomplishes a few nice effects:
>
> - By using prandom_u32_max() with an upper-bound that the compiler can
> prove at compile-time is ≤65536 or ≤256, internally get_random_u16()
> or get_random_u8() is used, which wastes fewer batched random bytes,
> and hence has higher throughput.
>
> - By using prandom_u32_max() instead of %, when the upper-bound is not a
> constant, division is still avoided, because prandom_u32_max() uses
> a faster multiplication-based trick instead.
>
> - By using get_random_u16() or get_random_u8() in cases where the return
> value is intended to indeed be a u16 or a u8, we waste fewer batched
> random bytes, and hence have higher throughput.
>
> So, based on those rules and benefits from following them, this patchset
> breaks down into the following five steps:
>
> 1) Replace `prandom_u32() % max` and variants thereof with
> prandom_u32_max(max).
>
> * Part 1 is done with Coccinelle. Part 2 is done by hand.
>
> 2) Replace `(type)get_random_u32()` and variants thereof with
> get_random_u16() or get_random_u8(). I took the pains to actually
> look and see what every lvalue type was across the entire tree.
>
> * Part 1 is done with Coccinelle. Part 2 is done by hand.
>
> 3) Replace remaining deprecated uses of prandom_u32() and
> get_random_int() with get_random_u32().
>
> * A boring search and replace operation.
>
> 4) Replace remaining deprecated uses of prandom_bytes() with
> get_random_bytes().
>
> * A boring search and replace operation.
>
> 5) Remove the deprecated and now-unused prandom_u32() and
> prandom_bytes() inline wrapper functions.
>
> * Just deleting code and updating comments.
>
> I was thinking of taking this through my random.git tree (on which this
> series is currently based) and submitting it near the end of the merge
> window, or waiting for the very end of the 6.1 cycle when there will be
> the fewest new patches brewing. If somebody with some treewide-cleanup
> experience might share some wisdom about what the best timing usually
> winds up being, I'm all ears.
>
> Please take a look! The number of lines touched is quite small, so this
> should be reviewable, and as much as is possible has been pushed into
> Coccinelle scripts.

For the series:
Reviewed-by: Yury Norov <yury.norov@xxxxxxxxx>

Although, looking at it, I have a feeling that kernel needs to drop all
fixed-size random APIs like get_random_uXX() or get_random_int(), because
people will continue using the 'get_random_int() % num' carelessly.

Thanks,
Yury