RE: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy()for unaligned copy

From: Ma, Ling
Date: Mon Oct 18 2010 - 04:02:47 EST

Next message: Sascha Hauer: "Re: [PATCH] dma: imx-dma: fix signedness bug"
Previous message: Stephen Rothwell: "linux-next: Tree for October 18"
In reply to: Miao Xie: "Re: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy()for unaligned copy"
Next in thread: Miao Xie: "Re: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy()for unaligned copy"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

>> rep_good will cause memcpy jump to memcpy_c, so not run this patch,
>> we may continue to do further optimization on it later.

>Yes, but in fact, the performance of memcpy_c is not better on some micro-architecture(such as:
>Wolfdale-3M, ), especially in the unaligned cases, so we need do optimization for it, and I think
>the first step of optimization is optimizing the original code of memcpy().

As mentioned above , we will optimize further memcpy_c soon.
Two reasons :
1. movs instruction need long lantency to startup
2. movs instruction is not good for unaligned case.

>> BTW the improvement is only from core2 shift register optimization,
>> but for most previous cpus shift register is very sensitive because of decode stage.
>> I have test Atom, Opteron, and Nocona, new patch is still better.

>I think we can add a flag to make this improvement only valid for Core2 or other CPU like it,
>just like X86_FEATURE_REP_GOOD.

We should optimize core2 in memcpy_c function in future, I think.

Thanks
Ling

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Sascha Hauer: "Re: [PATCH] dma: imx-dma: fix signedness bug"
Previous message: Stephen Rothwell: "linux-next: Tree for October 18"
In reply to: Miao Xie: "Re: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy()for unaligned copy"
Next in thread: Miao Xie: "Re: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy()for unaligned copy"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]