[PATCH] New csum and csum_copy routines with boot-time selection

From: Denis Vlasenko (vda@port.imtp.ilyichevsk.odessa.ua)
Date: Wed Oct 30 2002 - 07:32:37 EST


Here is a kernel patch which replaces compile-time selection
of csum and csum_copy routines with boot-time one.
This allows kernel to use fastest routines _for given CPU_.
Checks CPU capabilities before using 'em.

I tried to generalize code so it will be possible to adapt it
for memcpy, copy to/from user etc.

Theory of operation: I replace functions with pointers in .h files:

-asmlinkage unsigned int csum_partial(const unsigned char * buff, int len, unsigned int sum);
+extern
+asmlinkage unsigned int (*csum_partial)(const unsigned char * buff, int len, unsigned int sum);

Pointers are initialized to default routines:

+csum_func *csum_partial = csum_simple;
+csumcpy_func *csum_partial_copy_generic = csumcpy_simple;

After benchmarking they are replaced by fastest ones:

+ best = find_best(bench_csum, buffer, csum_runner,
+ VECTOR_SZ(csum_runner));
+ printk("csum: using csum function: %s\n", best->name);
+ csum_partial = (csum_func*)(best->f);

bench_func.[ch] contains generic benchmarking support.
To be used for memcpy and copy to/from user... later...

Some thoughts:

+/* Set this to value bigger than cache(s) */
+#define bufshift 20 /* 10=1kb,20=1Mb etc */
+#define chunksz (4*1024)

How to make this picked according to cache amount
(I mean reliably even for future CPUs)?

+ char *buffer = (char *) __get_free_pages(GFP_KERNEL,
+ (bufshift-PAGE_SHIFT));

hmmm... using __xx func... what is 'official' way to do it?

+#define ROUND(x) \
+SRC( movq x(%esi), %mm0 ); \
+ adcl x(%esi), %eax ; \
+ adcl x+4(%esi), %eax ; \
+DST( movntq %mm0, x(%edi) );
+// we don't need SRC() around 2nd and 3rd commands
+// (exception, if any, would be catched by 1st one)
+// (FIXME: can races against interrupts bite us?)

So, can page disappear under us here or not?

Currently code is very verbose at boot time and runs too much
benchmark repetitions for debugging purposes.

I don't know of any bugs in the code, but they can be there.
No warranties. I gave it a limited testing so far, it didn't eat
my disk and didn't drink my coffee :)

My dmesg snippet shown below. Patch and test suite attached
(bzipped).

--
vda

dmesg ===== Linux version 2.4.20-pre11csumtest (root@localhost) (gcc version 3.2) #14 SMP ... ... IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 1024 buckets, 16Kbytes TCP: Hash tables configured (established 8192 bind 10922) NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. Measuring network checksumming speed allocated 256 pages count = 34 count = 34 count = 34 count = 34 count = 34 simple : 348.160 MB/sec count = 50 count = 50 count = 50 count = 50 count = 50 PII : 512.000 MB/sec unsupported caps: 25 func SSE/MMX+ skipped: not supported by CPU count = 79 count = 79 count = 79 count = 79 count = 79 SSE/MMX+ : 808.960 MB/sec csum: using csum function: SSE/MMX+ count = 20 count = 20 count = 20 count = 20 count = 20 simple : 204.800 MB/sec count = 19 count = 19 count = 19 count = 19 count = 19 PII : 194.560 MB/sec unsupported caps: 25 func SSE/MMX+ skipped: not supported by CPU count = 35 count = 35 count = 35 count = 35 count = 35 SSE/MMX+ : 358.400 MB/sec csum: using csum_copy function: SSE/MMX+ freed 256 pages Measuring network checksumming speed allocated 256 pages count = 34 count = 34 count = 34 count = 34 count = 34 simple : 348.160 MB/sec count = 50 count = 50 count = 50 count = 50 count = 50 PII : 512.000 MB/sec unsupported caps: 25 func SSE/MMX+ skipped: not supported by CPU count = 79 count = 79 count = 79 count = 79 count = 79 SSE/MMX+ : 808.960 MB/sec csum: using csum function: SSE/MMX+ count = 20 count = 20 count = 20 count = 20 count = 20 simple : 204.800 MB/sec count = 19 count = 19 count = 19 count = 19 count = 19 PII : 194.560 MB/sec unsupported caps: 25 func SSE/MMX+ skipped: not supported by CPU count = 35 count = 35 count = 35 count = 35 count = 35 SSE/MMX+ : 358.400 MB/sec csum: using csum_copy function: SSE/MMX+ freed 256 pages VFS: Mounted root (ext2 filesystem) readonly. Mounted devfs on /dev Freeing unused kernel memory: 140k freed



- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:46 EST