[PATCH] Tell the world we gave up on pushing CC_OPTIMIZE_FOR_SIZE

From: Kirill Smelkov
Date: Fri Nov 02 2012 - 07:51:22 EST


[continuing 281dc5c5 "Give up on pushing CC_OPTIMIZE_FOR_SIZE"]

Recently I've been beaten hard by CC_OPTIMIZE_FOR_SIZE=y on X86
performance-wise. The problem turned out to be for -Os gcc wants to
inline __builtin_memcpy, to which x86 memcpy directly refers,

---- 8< ---- arch/x86/include/asm/string_32.h
#if (__GNUC__ >= 4)
#define memcpy(t, f, n) __builtin_memcpy(t, f, n)

to "rep; movsb" which is several times slower compared to "rep; movsl".

For me this turned out in vivi driver, where memcpy is used to copy
lines with colorbars, and this is one of the most significant parts of
the workload:

---- 8< ---- drivers/media/platform/vivi.c
static void vivi_fillbuff(struct vivi_dev *dev, struct vivi_buffer *buf)
{
...

for (h = 0; h < hmax; h++)
memcpy(vbuf + h * wmax * dev->pixelsize,
dev->line + (dev->mv_count % wmax) * dev->pixelsize,
wmax * dev->pixelsize);

Gcc insists on using movb, even if it knows dest and src alignment. For
example with gcc-4.4, -4.7 and yesterday's gcc trunk, for following function

---- 8< ----
void doit(unsigned long *dst, unsigned long *src, unsigned n)
{
void *__d = __builtin_assume_aligned(dst, 4);
void *__s = __builtin_assume_aligned(src, 4);

__builtin_memcpy(__d, __s, n);
}

it still wants to use movsb with -Os:

00000000 <doit>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 57 push %edi
4: 8b 4d 10 mov 0x10(%ebp),%ecx
7: 56 push %esi
8: 8b 7d 08 mov 0x8(%ebp),%edi
b: 8b 75 0c mov 0xc(%ebp),%esi
e: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
10: 5e pop %esi
11: 5f pop %edi
12: 5d pop %ebp
13: c3 ret

and even if I change "n" to "4*n"...

On the other hand, with -O2, it generates call to memcpy, which at least
has rep; movsl inside it, and things works several times faster.

So tell people to not enable CC_OPTIMIZE_FOR_SIZE by default.

Signed-off-by: Kirill Smelkov <kirr@xxxxxxxxxx>
---
init/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 6fdd6e3..6a448d5 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1119,7 +1119,7 @@ config CC_OPTIMIZE_FOR_SIZE
Enabling this option will pass "-Os" instead of "-O2" to gcc
resulting in a smaller kernel.

- If unsure, say Y.
+ If unsure, say N.

config SYSCTL
bool
--
1.8.0.316.g291341c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/