Re: [PATCH 1/1] x86: fix text_poke

From: Ingo Molnar
Date: Fri Apr 25 2008 - 13:54:35 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> > performance i dont think we should be too worried about at this
> > moment - this code is so rarely used that it should be driven by
> > robustness i think.
>
> That really isn't true. This isn't done just once. It's done many
> thousands of times.
>
> I agree that it has to be robust, but if we want to make
> suspend/resume be instantaneous (and we do), performance does actually
> matter. Yes, this is probably much less of a problem than waiting for
> devices, and no, I haven't timed it, but if I counted right, we'll
> literally be going almost ten thousand of these calls over a
> suspend/resume cycle.
>
> That's not "rarely used".

yeah, it's done 2800 times on my box with a distro .config.

no strong feeling either way - but i dont think there's any cross-CPU
TLB flush done in this case within vmap()/vunmap(). Why? Because when
alternative_instructions() runs then we have just a single CPU in
cpu_online_map.

So i think it's only direct vmap()/vunmap() overhead, on a single CPU.
We do a kmalloc/kfree which is rather fast - sub-microsecond. We install
the pages in the pte's - this is rather fast as well - sub-microsecond.
Even assuming cache-cold lines (which they are most of the time) and
taken thousands of times that's at most a few milliseconds IMO.

In fact, most of the actual vmap() related overhead should be
well-cached (the kmalloc bits) - the main cost should come from trashing
through all the instruction sites and modifying them.

i just measured the actual costs, and the UP/SMP offline/online
transition time (with Jiri's patch applied) is:

# time echo 0 > /sys/devices/system/cpu/cpu1/online

real 0m0.116s
user 0m0.000s
sys 0m0.008s

# time echo 1 > /sys/devices/system/cpu/cpu1/online

real 0m0.095s
user 0m0.000s
sys 0m0.069s

with your fixmap patch:

# time echo 0 > /sys/devices/system/cpu/cpu1/online

real 0m0.110s
user 0m0.001s
sys 0m0.003s

# time echo 1 > /sys/devices/system/cpu/cpu1/online

real 0m0.099s
user 0m0.000s
sys 0m0.072s

(i ran it multiple times and picked a representative run)

i also did a third control run with a kernel that had
alternative_instructions() disabled. The offline/online cost is:

# time echo 0 > /sys/devices/system/cpu/cpu1/online

real 0m0.108s
user 0m0.000s
sys 0m0.000s

# time echo 1 > /sys/devices/system/cpu/cpu1/online

real 0m0.096s
user 0m0.000s
sys 0m0.068s

_perhaps_ there's a decrease in time but i couldnt say it for sure,
because in the 'go online' case the numbers are so similar.

In the go-offline case there seems to be a gradual decrease but that
could be statistical noise. (The user/sys times are not reliable because
most of this happens with irqs off, but the 'real' portion should be
reliable.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/