RE: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by enhancedREP MOVSB/STOSB

From: Yu, Fenghua
Date: Wed May 18 2011 - 14:35:59 EST


> -----Original Message-----
> From: Andi Kleen [mailto:andi@xxxxxxxxxxxxxx]
> Sent: Tuesday, May 17, 2011 9:05 PM
> To: Yu, Fenghua
> Cc: Andi Kleen; Ingo Molnar; Thomas Gleixner; H Peter Anvin; Mallick,
> Asit K; Linus Torvalds; Avi Kivity; Arjan van de Ven; Andrew Morton;
> linux-kernel
> Subject: RE: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by
> enhanced REP MOVSB/STOSB
> > Only memcpy are generated by gcc when gcc version >=4.3. Other
> functions
> > are defined by kernel lib.
>
> Are you sure? AFAIK it supports more.

I use gcc 4.3.2 installed by FC10 to build kernel with defconfig. Only memcpy is built with gcc builtin and inline memcpy. All of others (i.e. memset, clear_page, memmove, and copy_user) call the kernel lib.

It's easy to check this by disassembling kernel binary.

Gcc 4.3.2 and FC10 are old but not so old. They have this capabilities.

>
> > I would leave gcc optimization for most memcpy cases instead of
> forcing
> > memcpy to call the kernel lib memcpy. I hope gcc will catch up and
> > implement a good enhanced rep movsb/stosb solution soon. If turns out
> gcc
> > can not generate good memcpy, it's easy to switch to the patching
> kernel
> > lib memcpy.
>
> The problem is that gcc can only do that if you tell it to generate
> code for that. But it has no mechanism to patch in/out different
> variants for the same binary. So it would only work for a specially
> optimized kernel for that CPU.
>
> I suspect for smaller copies it won't make too much different anyways
> and gcc's code is probably fine. But gcc won't know that you
> can do better on large copies, so using a macro would be a way
> to tell it that.
>
> -Andi

I absolutely agree with you on that. For example, gcc builds memcpy as inlined rep movsb for big copy. This works fine on enhanced rep movsb/stosb processors. But it doesn't work as good as kernel lib memcpy on non rep movsb/stosb processors which are mostly current machine in the market.

I discussed this issue with others before. Seems people like to wait for enhanced rep movsb/stosb enabled gcc to come and see the performance data with gcc version and kernel lib version to decide which way to go.

With the patch set, at least on gcc 4.3.2, the optimization works fine except memcpy.

If people don't want to wait for gcc to optimize the mem lib with ERMS, it's easy to force those function to use lib functions. I can send a small patch in string_64/32.h to do so.

Thanks.

-Fenghua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/