I can get improved perfomance from my k6-II 375/75 (overclocked) with this
code for zero page:
asm __volatile__ ( \
" prefetchw (%0); \n" \
" movl $0xff, %1 \n" \
" xorl %2,%2 \n" \
"1: movl %2,(%0) \n" \
" prefetchw 0x10(%0) \n" \
" movl %2,0x4(%0) \n" \
" movl %2,0x8(%0) \n" \
" movl %2,0xc(%0) \n" \
" addl $0x10,%0 \n" \
" decl %1 \n" \
" jns 1b " \
:"=r" (page), "=r" (tmp1), "=r" (tmp2) : "0" (page): "memory" );
I can clean a page not in cache (if i got my benchmark correct) in 17690
clock cycles (measured with rdtsc) vs. 21224 for rep stol ...
This is the tree register version. The speed is the same if you use a
movl $0x0,(%0) in place of movl %2,(%0) but the icache footprint is smaller.
-- This is not a Sig. (With homage to Magritte).- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/