RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranginginstruction sequence and saving register

From: Ma, Ling
Date: Mon Oct 15 2012 - 01:07:32 EST

Next message: Ming Lei: "[RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation"
Previous message: Paul Mundt: "Re: [GIT PULL] Disintegrate UAPI for sh [ver #2]"
In reply to: Borislav Petkov: "Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranginginstruction sequence and saving register"
Next in thread: George Spelvin: "Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Thanks Boris!
So the patch is helpful and no impact for other/older machines,
I will re-send new version according to comments.
Any further comments are appreciated!

Regards
Ling

> -----Original Message-----
> From: Borislav Petkov [mailto:bp@xxxxxxxxx]
> Sent: Sunday, October 14, 2012 6:58 PM
> To: Ma, Ling
> Cc: Konrad Rzeszutek Wilk; mingo@xxxxxxx; hpa@xxxxxxxxx;
> tglx@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; iant@xxxxxxxxxx;
> George Spelvin
> Subject: Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging
> instruction sequence and saving register
>
> On Fri, Oct 12, 2012 at 08:04:11PM +0200, Borislav Petkov wrote:
> > Right, so benchmark shows around 20% speedup on Bulldozer but this is
> > a microbenchmark and before pursue this further, we need to verify
> > whether this brings any palpable speedup with a real benchmark, I
> > don't know, kernbench, netbench, whatever. Even something as boring
> as
> > kernel build. And probably check for perf regressions on the rest of
> > the uarches.
>
> Ok, so to summarize, on AMD we're using REP MOVSQ which is even faster
> than the unrolled version. I've added the REP MOVSQ version to the
> Âbenchmark. It nicely validates that we're correctly setting
> X86_FEATURE_REP_GOOD on everything >= F10h and some K8s.
>
> So, to answer Konrad's question: those patches don't concern AMD
> machines.
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
èº{.nÇ+‰·Ÿ®‰†+%ŠËlzwm…ébëæìr¸›zX§»®w¥Š{ayºÊÚë,j¢f£¢·hš‹àz¹®w¥¢¸¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾«‘êçzZ+ƒùšŽŠÝj"ú!¶iO•æ¬z·švØ^¶m§ÿðÃnÆàþY&—

Next message: Ming Lei: "[RFC PATCH 1/3] mm: teach mm by current context info to not do I/O during memory allocation"
Previous message: Paul Mundt: "Re: [GIT PULL] Disintegrate UAPI for sh [ver #2]"
In reply to: Borislav Petkov: "Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranginginstruction sequence and saving register"
Next in thread: George Spelvin: "Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]