Re: [PATCH RFC 8/9] ratelimit: Reduce ratelimit's false-positive misses
From: Petr Mladek
Date: Tue Apr 08 2025 - 12:45:24 EST
On Thu 2025-04-03 14:15:13, Paul E. McKenney wrote:
> The current ratelimit implementation can suffer from false-positive
> misses. That is, ___ratelimit() might return zero (causing the caller
> to invoke rate limiting, for example, by dropping printk()s) even when
> the current burst has not yet been consumed. This happens when one CPU
> holds a given ratelimit structure's lock and some other CPU concurrently
> invokes ___ratelimit(). The fact that the lock is a raw irq-disabled
> spinlock might make low-contention trylock failure seem unlikely, but
> vCPU preemption, NMIs, and firmware interrupts can greatly extend the
> trylock-failure window.
>
> Avoiding these false-positive misses is especially important when
> correlating console logged hardware failures with other information.
>
> Therefore, instead of attempting to acquire the lock on each call to
> ___ratelimit(), construct a lockless fastpath and only acquire the lock
> when retriggering (for the next burst) or when resynchronizing (due to
> either a long idle period or due to ratelimiting having been disabled).
> This reduces the number of lock-hold periods that can be extended
> by vCPU preemption, NMIs and firmware interrupts, but also means that
> these extensions must be of much longer durations (generally moving from
> milliseconds to seconds) before they can result in false-positive drops.
>
> In addition, the lockless fastpath gets a 10-20% speedup compared to
> the old fully locked code on my x86 laptop. Your mileage will of course
> vary depending on your hardware, workload, and configuration.
>
> [ paulmck: Apply feedback from Dan Carpenter. ]
>
> diff --git a/lib/ratelimit.c b/lib/ratelimit.c
> index bd6e3b429e333..03704c6f8899e 100644
> --- a/lib/ratelimit.c
> +++ b/lib/ratelimit.c
> @@ -26,55 +26,161 @@
> */
> int ___ratelimit(struct ratelimit_state *rs, const char *func)
> {
> - /* Paired with WRITE_ONCE() in .proc_handler().
> - * Changing two values seperately could be inconsistent
> - * and some message could be lost. (See: net_ratelimit_state).
> - */
> - int interval = READ_ONCE(rs->interval);
> + unsigned long begin;
> int burst = READ_ONCE(rs->burst);
> + int delta = 0;
> unsigned long flags;
> - int ret;
> -
> - if (!interval)
> - return 1;
> + bool gotlock = false;
> + bool initialized;
> + int interval = READ_ONCE(rs->interval);
> + unsigned long j;
> + int n_left;
>
> /*
> - * If we contend on this state's lock then almost
> - * by definition we are too busy to print a message,
> - * in addition to the one that will be printed by
> - * the entity that is holding the lock already:
> + * If the burst or interval settings mark this ratelimit_state
> + * structure as disabled, then clear the RATELIMIT_INITIALIZED bit
> + * in ->flags to force resetting of the ratelimiting interval when
> + * this ratelimit_state structure is next re-enabled.
> */
> - if (!raw_spin_trylock_irqsave(&rs->lock, flags)) {
> - ratelimit_state_inc_miss(rs);
> - return 0;
> + if (burst <= 0 || interval <= 0) {
> + if ((READ_ONCE(rs->flags) & RATELIMIT_INITIALIZED) &&
> + raw_spin_trylock_irqsave(&rs->lock, flags)) {
> + if (READ_ONCE(rs->flags) & RATELIMIT_INITIALIZED)
> + smp_store_release(&rs->flags, rs->flags & ~RATELIMIT_INITIALIZED);
> + raw_spin_unlock_irqrestore(&rs->lock, flags);
> + }
> + return true;
> }
>
> - if (!(rs->flags & RATELIMIT_INITIALIZED)) {
> - rs->begin = jiffies;
> - rs->flags |= RATELIMIT_INITIALIZED;
> + /*
> + * If this structure has just now been ratelimited, but not yet
> + * reset for the next rate-limiting interval, take an early and
> + * low-cost exit.
> + */
> + if (atomic_read_acquire(&rs->rs_n_left) <= 0) /* Pair with release. */
> + goto limited;
> +
> + /*
> + * If this structure is marked as initialized and has been
> + * recently used, pick up its ->begin field. Otherwise, pick up
> + * the current time and attempt to re-initialized the structure.
> + */
> + j = jiffies;
> + initialized = smp_load_acquire(&rs->flags) & RATELIMIT_INITIALIZED; /* Pair with release. */
> + if (initialized) {
> + begin = READ_ONCE(rs->begin);
> + } else {
> + /*
> + * Uninitialized or long idle, so reset ->begin and
> + * mark initialized. If we fail to acquire the lock,
> + * let the lock holder do the work.
> + */
> + begin = j;
> + if (raw_spin_trylock_irqsave(&rs->lock, flags)) {
> + if (!(READ_ONCE(rs->flags) & RATELIMIT_INITIALIZED)) {
> + begin = jiffies;
> + j = begin;
> + WRITE_ONCE(rs->begin, begin);
> + smp_store_release(&rs->flags, /* Pair with acquire. */
> + rs->flags | RATELIMIT_INITIALIZED);
> + initialized = true;
> + }
> + raw_spin_unlock_irqrestore(&rs->lock, flags);
> + }
> }
>
> - if (time_is_before_jiffies(rs->begin + interval)) {
> - int m = ratelimit_state_reset_miss(rs);
> + /*
> + * If this structure is still in the interval in which has
> + * already hit the rate limit, take an early and low-cost exit.
> + */
> + if (initialized && time_before(begin - 2 * interval, j) && time_before(j, begin))
How do we know that the previous interval was ratelimited, please?
I have to admit that I still do not fully get the new logic ;-)
> + goto limited;
>
> - if (m) {
> - if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> - printk_deferred(KERN_WARNING
> - "%s: %d callbacks suppressed\n", func, m);
> - }
> + /*
> + * Register another request, and take an early (but not low-cost)
> + * exit if rate-limiting just nowcame into effect.
> + */
> + n_left = atomic_dec_return(&rs->rs_n_left);
> + if (n_left < 0)
> + goto limited; /* Just now started ratelimiting. */
> + if (n_left > 0) {
> + /*
> + * Otherwise, there is not yet any rate limiting for the
> + * current interval, and furthermore there is at least one
> + * last count remaining. But check to see if initialization
> + * is required or if we have run off the end of the interval
> + * without rate limiting having been imposed. Either way,
> + * we eventually return @true to tell our caller to go ahead.
> + */
> + if (initialized &&
> + time_before(begin - interval, j) && time_before(j, begin + interval))
> + return true; /* Nothing special to do. */
> + if (!raw_spin_trylock_irqsave(&rs->lock, flags))
> + return true; /* Let lock holder do special work. */
> + interval = READ_ONCE(rs->interval);
> + begin = rs->begin;
> + initialized = smp_load_acquire(&rs->flags) & RATELIMIT_INITIALIZED;
> + if (interval <= 0 ||
> + (initialized &&
> + time_before(begin - interval, j) && time_before(j, begin + interval))) {
> + /*
> + * Someone else beat us to the special work,
> + * so release the lock and return.
> + */
I am confused by the comment. IMHO, we are here when we do not need
to reset the interval. It happens either when the interval is
infinite (interval < 0) or when we are still in the currrent interval
and more calls are allowed (n_left > 0).
> + raw_spin_unlock_irqrestore(&rs->lock, flags);
> + return true;
> }
> - rs->begin = jiffies;
> - rs->printed = 0;
> +
> + /* We have the lock and will do initialization. */
> + gotlock = true;
> + delta = -1;
> }
> - if (burst && burst > rs->printed) {
> - rs->printed++;
> - ret = 1;
> - } else {
> - ratelimit_state_inc_miss(rs);
> - ret = 0;
> + if (!gotlock) {
> + /*
> + * We get here if we got the last count (n_left == 0),
> + * so that rate limiting is in effect for the next caller.
> + * We will return @true to tell our caller to go ahead,
> + * but first we acquire the lock and set things up for
> + * the next rate-limiting interval.
> + */
> + raw_spin_lock_irqsave(&rs->lock, flags);
I do not have a good feeling about this. It could cause a deadlock
in panic() when other CPUs are stopped or in NMI. And the ratelimit
used to be "safe" in this context.
Honestly, the new code is a bit hard to follow for me. The original
logic was more strightforward. I wonder if we could reduce the problem
"just" by checking the remaining allowed count.
I mean to introduce the atomic rs_n_left and check it when
it is not possible to take the lock. But keep most of
the current logic. Something like: