Re-tune x86 uaccess code for PREEMPT_VOLUNTARY v2

From: Andi Kleen
Date: Tue Aug 13 2013 - 20:09:36 EST


The x86 user access functions (*_user) were originally very well tuned,
with partial inline code and other optimizations.

Then over time various new checks -- particularly the sleep checks for
a voluntary preempt kernel -- destroyed a lot of the tunings

A typical user access operation is now doing multiple useless
function calls. Also the without force inline gcc's inlining
policy makes it even worse, with adding more unnecessary calls.

Here's a typical example from ftrace:

10) | might_fault() {
10) | _cond_resched() {
10) | should_resched() {
10) | need_resched() {
10) 0.063 us | test_ti_thread_flag();
10) 0.643 us | }
10) 1.238 us | }
10) 1.845 us | }
10) 2.438 us | }

So we spent 2.5us doing nothing (ok it's a bit less without
ftrace, but still pretty bad)

Then in other cases we would have an out of line function,
but would actually do the might_sleep() checks in the inlined
caller. This doesn't make any sense at all.

There were also a few other problems, for example the x86-64 uaccess
code regularly falls back to string functions, even though a simple
mov would be enough. For example every futex access to the lock
variable would actually use string instructions, even though
it's just 4 bytes.

This patch kit is an attempt to get us back to sane code,
mostly by doing proper inlining and doing sleep checks in the right
place. Unfortunately I had to add one tree sweep to avoid an nasty
include loop.

v2: Now completely remove reschedule checks for uaccess functions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/