Re: [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock()

From: Linus Torvalds
Date: Wed Jul 20 2011 - 18:45:53 EST

On Wed, Jul 20, 2011 at 11:26 AM, Paul E. McKenney
<paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> Given some common flag combinations, particularly -Os, gcc will inline
> rcu_read_unlock_special() despite its being in an unlikely() clause.
> Use noinline to prohibit this misoptimization.

Btw, I suspect that we should at least look at what it would mean if
we make the rcu_read_lock_nesting and the preempt counters both be
per-cpu variables instead of making them per-thread/process counters.

Then, when we switch threads, we'd just save/restore them from the
process register save area.

There's a lot of critical code sequences (spin-lock/unlock, rcu
read-lock/unlock) that currently fetches the thread/process pointer
only to then offset it and increment the count. I get the strong
feeling that code generation could be improved and we could avoid one
level of indirection by just making it a per-thread counter.

For example, instead of __rcu_read_lock: looking like this (and being
an external function, partly because of header file dependencies on
the data structures involved):

push %rbp
mov %rsp,%rbp
mov %gs:0xb580,%rax
incl 0x100(%rax)

it should inline to just something like

incl %gs:0x100

instead. Same for the preempt counter.

Of course, it would need to involve making sure that we pick a good
cacheline etc that is already always dirty. But other than that, is
there any real downside?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at