>20% or so on kernel compiles. And what do you mean by "Worst case" in
20% is an huge improvement. I believe you was very I/O bound but 20% is
definitely interesting anyway ;).
On PII a clear page with a cold cache took around 9000 CPU cycles. With
the cache hot it takes 1000 CPU cycles.
Alpha tooks even less CPU cycles (btw on Alpha PAGE_SIZE is doubled
compared to i386 ;).
I assume that you are comparing these two cases:
1) bzero at fault time bypassing the cache
2) case `1' + cache of zeroed page generated by the idle task
right? I am asking this because if your `1' doesn't bypass the cache too,
then the result you got is not interesting.
Also you did it on PPC. Maybe on PPC the bzero has a different performance
impact?
This is the little proggy I wrote to get the numbers (you need 10mbyte of
stack to run it).
#define __KERNEL__
#define CONFIG_X86_TSC
#include <asm/timex.h>
#include <asm/page.h>
#define POLLUTE (1024*1024*10)
#define CACHE_HOT 0
main()
{
int i;
cycles_t start, stop;
char buf[PAGE_SIZE+~PAGE_MASK];
char cache_pollute[POLLUTE];
char * p;
p = (char *)(((unsigned long) buf + ~PAGE_MASK) & PAGE_MASK);
/* pagein */
bzero(p, PAGE_SIZE);
/* remove the buf from the cache */
bzero(cache_pollute, POLLUTE);
for (i = 0; i <= CACHE_HOT; i++)
{
start = get_cycles();
bzero(p, PAGE_SIZE);
stop = get_cycles();
}
printf("cycles: %u\n", stop - start);
}
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/