Re: [PATCH] Make rcutorture RNG use temporal entropy

From: Paul E. McKenney
Date: Tue Sep 04 2007 - 13:47:42 EST


On Tue, Sep 04, 2007 at 09:14:19AM -0700, Paul E. McKenney wrote:
> On Tue, Sep 04, 2007 at 11:16:50AM +0530, Satyam Sharma wrote:
> > Hi Paul,
> >
> > On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> > >
> > > The locking used by get_random_bytes() can conflict with the
> > > preempt_disable() and synchronize_sched() form of RCU. This patch changes
> > > rcutorture's RNG to gather entropy from the new cpu_clock() interface
> > > (relying on interrupts, preemption, daemons, and rcutorture's reader
> > > thread's rock-bottom scheduling priority to provide useful entropy),
> > > and also adds and EXPORT_SYMBOL_GPL() to make that interface available
> > > to GPLed kernel modules such as rcutorture.
> >
> > Honestly, rcutorture goes to some amazing lengths just to have this
> > randomizing-the-delays-that-read/write-test-threads-spend-inside-or-
> > outside-the-critical-sections thing :-) Especially, seeing that
> > synchro-test, the other "comparable" module, just doesn't bother with
> > all this at all. (especially check out its load == interval == do_sched
> > == 0 case! :-)
>
> Yep. The need for that level of randomization in rcutorture has been made
> painfully clear to me over a period of more than a decade. Of course,
> the overhead of the re-seeding does get diluted by a factor of 10,000 or
> 100,000, depending on what version you are using. So, from a throughput
> standpoint, the overhead is essentially that of a linear congruential
> random-number generator. This is critically important given the low
> overhead of rcu_read_lock() and rcu_read_unlock().
>
> Still, this is indeed not what you want on a fastpath of a realtime
> system, where average performance means nothing -- only the worst case
> counts. And this is why I am -not- putting the rcutorture RNG forward
> for general-purpose use. So we are at least in agreement on that piece!
>
> And, as you hint below, anyone running rcutorture while also running
> a production realtime workload needs to seriously rethink their design. ;-)
> (If you are instead running it to provide a test load for your realtime
> testing, fine and good.)
>
> > So IMHO, considering that rcutorture isn't a "serious" user of randomness
> > in the first place (of even a "fast-and-loose version" for that matter),
> > you could consider a solution where you gather all the randomness you need
> > at module_init time itself and save it somewhere, and then use it wherever
> > you're calling into rcu_random()->cpu_clock() [ or get_random_bytes() ]
> > in the current code. You could even make some trivial updates to those
> > random numbers after every RCU_RANDOM_REFRESH uses, like present.
>
> Well, assuming that the Linux kernel really needs a central implementation
> of a "pretty fast" and "pretty good" RNG, one could imagine all sorts of
> designs:
>
> 1. Use an LCRNG feeding into an array, as the old Berkeley random()
> does (or see Knuth for an earlier citation), but make it per-CPU.
> When pulling out randomness, do an MDn hash on the array
> along with a per-task counter and the per-CPU preempt counter.
> Increment the per-task counter on each use. Do an LCRNG step
> on each use. Since this is a fixed array, the collisions in
> CONFIG_PREEMPT due to preemption can be permitted to happen
> without penalty.
>
> This approach avoids all locking, all interrupt disabling, and
> all preemption disabling. But the MD hashes aren't the fastest
> things in the kernel, from what I understand.
>
> Question: will this be fast enough? If so, which of the MD
> hashes should be used?
>
> 2. As in #1 above, but use some simpler hash, such as addition or
> XOR. Maybe CRC. (Benchmark for speed.)
>
> 3. Just use a simple LCRNG with per-task state. Perturb from some
> statistical counter (the per-CPU RCU grace-period counter might
> be appropriate). Or don't even bother doing that.
>
> This would be -much- faster than any of the above, and would
> be deterministic, hence good for realtime use. LCRNG might not
> satisfy more-demanding users, especially the paranoid ones.
>
> (This is what you are proposing above, correct?)
>
> 4. Just use LCRNG into a array like Berkeley random(), but replicate
> on a per-CPU basis. Maybe or maybe not perturb occasionally
> from some statistical counter as in #3 above.
>
> This would be reasonably fast, and should satisfy most users.
> People needing cryptographically secure RNGs should of course
> stick with get_random_bytes().
>
> [If I had some blazing reason to implement this -right- -now-,
> this would be the approach I would take.]
>
> 5. Stick with the current situation where people needing fast
> and dirty RNGs roll their own.

Or, better yet, as suggested by Rusty:

6. Use random32() from lib/random32.c and be happy. This does
disable preemption across the calculation, which should not
be a problem in most situations. Although it does not protect
against interrupts, the effect would simply be to scramble
the state a bit more (or perhaps unscramble it a bit).

The overhead should not be too bad.

There. My work is done. ;-)

Thanx, Paul

> > Agreed, anybody running rcutorture isn't really looking for performance,
> > but why call get_random_bytes() or cpu_clock() (and the smp_processor_id()
> > + irq_save/restore + export_symbol() that goes with it) when it isn't
> > _really_ "required" as such ...
>
> Well, that would in fact be why the high-overhead path is taken only
> very rarely.
>
> And again, I am -not- putting the rcutorture RNG forward for general use,
> as it is a no-go for realtime fastpath use.
>
> Votes for #5 above? Given the total lack of any sort of response to
> Stephan Eranian's proposal last year, might be optimal. ;-)
>
> Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/