FW: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.Sby avoid memory miss predication.

From: Ma, Ling
Date: Wed Oct 28 2009 - 02:09:50 EST


Hi Ingo
There are another test cases we need to do or comments?

Best Regards
Ma Ling

________________________________________
From: Ma, Ling
Sent: 2009å10æ26æ 16:26
To: 'mingo@xxxxxxx'
Cc: 'hpa@xxxxxxxxx'; 'tglx@xxxxxxxxxxxxx'; 'linux-kernel@xxxxxxxxxxxxxxx'
Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by avoid memory miss predication.


We generate new report for another case when src offset is 0x45010, dst is 0x34020.
by 'perf stat --repeat 10 ./static_rsi_45010_rdi_34020_old/new' .
Â
The test program I wrote:
Âfor (i = 64; i < 4096 *4; i ++)
ÂÂÂÂÂ do_memcpy(src, dst, i);
Â
Â
Before the patch:
Performance counter stats for './static_rsi_45010_rdi_34020_old' (10 runs):ÂÂÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
 54014.766012 task-clock-msecs # 0.999 CPUs ( +- 0.016% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂ ÂÂÂ80Â context-switchesÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.000 M/secÂÂ ( +-ÂÂ 7.894% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂ ÂÂ0Â CPU-migrationsÂÂÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.000 M/secÂÂ ( +-Â 66.667% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂ 4429Â page-faultsÂÂÂÂÂÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.000 M/secÂÂ ( +-ÂÂ 0.002% )ÂÂÂÂÂÂÂ
Â136855571663Â cyclesÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ #ÂÂ 2533.670 M/secÂÂ ( +- ÂÂ0.016% )ÂÂÂÂÂÂÂ
 44524796868 instructions # 0.325 IPC ( +- 0.008% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂ 771000Â cache-referencesÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.014 M/secÂÂ ( +-Â 10.397% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂ 541785Â cache-missesÂÂÂÂÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.010 M/secÂÂ ( +-ÂÂ 4.203% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
 54.062799203 seconds time elapsed ( +- 0.021% ) Â
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
After the patchÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
Performance counter stats for './static_rsi_45010_rdi_34020_new' (10 runs):ÂÂÂÂÂÂ ÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
ÂÂ 7570.357661Â task-clock-msecsÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.999 CPUsÂÂÂ ( +-ÂÂ 0.350% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂ 13Â context-switchesÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.000 M/secÂÂ ( +-ÂÂ 9.320% )ÂÂ ÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂÂ 0Â CPU-migrationsÂÂÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.000 M/secÂÂ ( +-ÂÂÂÂ nan% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂ 4429Â page-faultsÂÂÂÂÂÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.001 M/secÂÂ ( +-ÂÂ 0.004% )ÂÂÂÂÂÂÂ
Â19180782064Â cyclesÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ #ÂÂ 2533.669 M/secÂÂ ( +-ÂÂ 0.349% ) ÂÂÂÂÂÂÂ
Â44462001104Â instructionsÂÂÂÂÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 2.318 IPCÂÂÂÂ ( +-ÂÂ 0.001% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂ Â383673Â cache-referencesÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.051 M/secÂÂ ( +-ÂÂ 4.112% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂ 317436Â cache-missesÂÂÂÂÂÂÂÂÂÂÂÂ #ÂÂÂÂÂ 0.042 M/secÂÂ ( +-ÂÂ 1.607% )ÂÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
ÂÂ 7.581541785Â seconds time elapsedÂÂ ( +-ÂÂ 0.343% )ÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Â
The patch got performance improvement 54.062799203/ 7.581541785Â = 7.13x.
If you need any other test reports, please let me know
Â
Thanks
Ma Ling


èº{.nÇ+‰·Ÿ®‰­†+%ŠËlzwm…ébëæìr¸›zX§»®w¥Š{ayºÊÚë,j­¢f£¢·hš‹àz¹®w¥¢¸ ¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾«‘êçzZ+ƒùšŽŠÝj"ú!¶iO•æ¬z·švØ^¶m§ÿðà nÆàþY&—