Re: [RFC] per-cpu preempt_count

From: Linus Torvalds
Date: Mon Aug 12 2013 - 13:35:55 EST


On Mon, Aug 12, 2013 at 4:51 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> The below boots to wanting to mount a root filesystem with
> CONFIG_PREEMPT=y using kvm -smp 4.

But doesn't work in general? Or you just never tested?

I think that "thread_info->preempt_count" variable would need to be
renamed to "saved_preempt_count" or similar to make sure we catch any
users. But the patch certainly looks simple otherwise.

I'm pretty sure I had a discussion about this with Paul McKenney some
time ago (because the RCU readlock is the most noticeable user of the
preempt count - the others tend to be hidden inside the out-of-line
spinlock functions etc), and I thought he had tried this and had some
problems. Maybe we've fixed things since, or maybe he missed some
case..

But if the patch really is this simple, then we should just do it. Of
course, we should double-check that the percpu preempt count is in a
cacheline that is already accessed (preferably already dirtied) by the
context switching code. And I think this should be an
architecture-specific thing, because using a percpu variable might be
good on some architectures but not others. So I get the feeling that
it should be in the x86 __switch_to(), rather than in the generic
code. I think it would fit very well with the per-cpu "old_rsp" and
"current_task" updates that we already do.

> Adding TIF_NEED_RESCHED into the preempt count would allow a single test
> in preempt_check_resched() instead of still needing the TI. Removing
> PREEMPT_ACTIVE from preempt count should allow us to get rid of
> ti::preempt_count altogether.
>
> The only problem with TIF_NEED_RESCHED is that its cross-cpu which would
> make the entire thing atomic which would suck donkey balls so maybe we
> need two separate per-cpu variables?

Agreed. Making it atomic would suck, and cancel all advantages of the
better code generation to access it. Good point.

And yeah, it could be two variables in the same cacheline or something.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/