Re: replace "memset(...,0,PAGE_SIZE)" calls with "clear_page()"?
From: dean gaudet
Date: Wed Jan 03 2007 - 01:24:11 EST
On Sat, 30 Dec 2006, Denis Vlasenko wrote:
> I was experimenting with SSE[2] clear_page() which uses
> non-temporal stores. That one requires 16 byte alignment.
>
> BTW, it worked ~300% faster than memset. But Andi Kleen
> insists that cache eviction caused by NT stores will make it
> slower in macrobenchmark.
>
> Apart from fairly extensive set of microbechmarks
> I tested kernel compiles (i.e. "real world load")
> and they are FASTER too, not slower, but Andi
> is fairly entrenched in his opinion ;)
> I gave up.
you know, with the kernel zeroing pages through the 1:1 phys mapping, and
userland accessing pages through a different mapping... it seems that
frequently virtual address bits 12..14 will differ between user and
kernel.
on K8 this results in a virtual alias conflict which costs *70 cycles* per
cache line. (K8 L1 DC uses virtual bits 12..14 as part of the index.)
this is larger than the cost for L1 miss L2 hit...
this wouldn't happen with movnt... but then we get into the handwaving
arguments about timing of accesses to the freshly zeroed page. too bad
there's no "evict from L1 to L2" operation -- that would avoid the virtual
alias problem.
there's an event (75h unit mask 02h) to measure virtual alias conflicts...
i've always wondered if there are workloads which trigger this behaviour.
it can happy on copy to/from user as well.
-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/