Re: [RFC] per-cpu preempt_count

From: Linus Torvalds
Date: Mon Aug 12 2013 - 15:01:05 EST


On Mon, Aug 12, 2013 at 10:58 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> We could still have the advantages of NEED_RESCHED in preempt_count() by
> realizing that we only rarely actually set/clear need_resched and mostly
> read it from the highest freq user, the preempt_enable() check.
>
> So we could have it atomic, but do atomic_read() in the preempt_enable()
> hotpath which wouldn't suck donkey balls, right?

Wrong. The thing is, the common case for preempt is to increment and
decrement the count, not testing it. Exactly because we do this for
spinlocks and for rcu read-locked regions.

Now, what we *could* do is to say:

- we will use the high bit of the preempt count for NEED_RESCHED

- when we set/clear that high bit, we *always* use atomic sequences,
and we never change any of the other bits.

- we will increment/decrement the other counters, we *only* do so on
the local CPU, and we don't use atomic accesses.

Now, the downside of that is that *because* we don't use atomic
accesses for the inc/dec parts, the updates to the high bit can get
lost. But because the high bit updates are done with atomics, we know
that they won't mess up the actual counting bits, so at least the
count is never corrupted.

And the NEED_RESCHED bit getting lost would be very unusual. That
clearly would *not* be acceptable for RT, but it it might be
acceptable for "in the unusual case where we want to preempt a thread
that was not preemtible, *and* we ended up having the extra unsual
case that preemption enable ended up missing the preempt bit, we don't
get preempted in a timely manner". It's probably impossible to ever
see in practice, and considering that for non-RT use the PREEMPT bit
is a "strong hint" rather than anything else, it sounds like it might
be acceptable.

It is obviously *not* going to be acceptable for the RT people,
though, but since they do different code sequences _anyway_, that's
not really much of an issue.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/