Re: Csum and csum copyroutines benchmark

From: Momchil Velikov (velco@fadata.bg)
Date: Fri Oct 25 2002 - 02:48:10 EST


>>>>> "Denis" == Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> writes:

Denis> /me said:
>> I'm experimenting with different csum_ routines in userspace now.

Denis> Short conclusion:
Denis> 1. It is possible to speed up csum routines for AMD processors by 30%.
Denis> 2. It is possible to speed up csum_copy routines for both AMD and Intel
Denis> three times or more. Roy, do you like that? ;)

Additional data point:

Short summary:
1. Checksum - kernelpii_csum is ~19% faster
2. Copy - lernelpii_csum is ~6% faster

Dual Pentium III, 1266Mhz, 512K cache, 2G SDRAM (133Mhz, ECC)

The only changes I made were to decrease the buffer size to 1K (as I
think this is more representative to a network packet size, correct me
if I'm wrong) and increase the runs to 1024. Max values are worthless
indeed.

Csum benchmark program
buffer size: 1 K
Each test tried 1024 times, max and min CPU cycles are reported.
Please disregard max values. They are due to system interference only.
csum tests:
                     kernel_csum - took 941 max, 740 min cycles per kb. sum=0x44000077
                     kernel_csum - took 748 max, 742 min cycles per kb. sum=0x44000077
                     kernel_csum - took 60559 max, 742 min cycles per kb. sum=0x44000077
                  kernelpii_csum - took 52804 max, 601 min cycles per kb. sum=0x44000077
                kernelpiipf_csum - took 12930 max, 601 min cycles per kb. sum=0x44000077
                        pfm_csum - took 10161 max, 1402 min cycles per kb. sum=0x44000077
                       pfm2_csum - took 864 max, 838 min cycles per kb. sum=0x44000077
copy tests:
                     kernel_copy - took 339 max, 239 min cycles per kb. sum=0x44000077
                     kernel_copy - took 239 max, 239 min cycles per kb. sum=0x44000077
                     kernel_copy - took 239 max, 239 min cycles per kb. sum=0x44000077
                  kernelpii_copy - took 244 max, 225 min cycles per kb. sum=0x44000077
                      ntqpf_copy - took 10867 max, 512 min cycles per kb. sum=0x44000077
                     ntqpfm_copy - took 710 max, 403 min cycles per kb. sum=0x44000077
                        ntq_copy - took 4535 max, 443 min cycles per kb. sum=0x44000077
                     ntqpf2_copy - took 563 max, 555 min cycles per kb. sum=0x44000077
Done

HOWEVER ...

sometimes (say 1/30) I get the following output:

Csum benchmark program
buffer size: 1 K
Each test tried 1024 times, max and min CPU cycles are reported.
Please disregard max values. They are due to system interference only.
csum tests:
                     kernel_csum - took 958 max, 740 min cycles per kb. sum=0x44000077
                     kernel_csum - took 748 max, 740 min cycles per kb. sum=0x44000077
                     kernel_csum - took 752 max, 740 min cycles per kb. sum=0x44000077
                  kernelpii_csum - took 624 max, 600 min cycles per kb. sum=0x44000077
                kernelpiipf_csum - took 877211 max, 601 min cycles per kb. sum=0x44000077
Bad sum
Aborted

which is to say that pfm_csum and pfm2_csum results are not to be
trusted (at least on PIII (or my kernel CONFIG_MPENTIUMIII=y
config?)).

~velco
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:26 EST