RE: [PATCH RFC] [x86] Optimize copy-page by reducing impact from HWprefetch

From: Ma, Ling
Date: Fri Jul 01 2011 - 04:12:29 EST


Forget to append experiment data:

1. We copy 4096 bytes for 32 times on snb, and extract minimum execution time
On hot cache case:
Copy_page copy_page_c
482 cycles 350 cycles

2. the same routine with hot-caches, but before each execution we copy 512k data to push original data out of L1 &L2.
On cold cache case:
copy_page(with prefetch) copy_page(without prefetch) copy_page_c
853~873 cycles 1037~1051 cycles 959~976 cycles

Thanks
Ling

> -----Original Message-----
> From: Ma, Ling
> Sent: Tuesday, June 28, 2011 11:24 PM
> To: 'Ingo Molnar'; Andi Kleen
> Cc: hpa@xxxxxxxxx; tglx@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: RE: [PATCH RFC] [x86] Optimize copy-page by reducing impact
> from HW prefetch
>
> Hi Ingo
>
> > Ling, mind double checking which one is the faster/better one on SNB,
> > in cold-cache and hot-cache situations, copy_page or copy_page_c?
> Copy_page_c
> on hot-cache copy_page_c on SNB combines data to 128bit (processor
> limit 128bit/cycle for write) after startup latency
> so it is faster than copy_page which provides 64bit/cycle for write.
>
> on cold-cache copy_page_c doesn't use prefetch, which uses prfetch
> according to copy size,
> so copy_page function is better.
>
> Thanks
> Ling

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/