Re: [PATCH] crypto: twofish - add x86_64/avx assemblerimplementation

From: Jussi Kivilinna
Date: Thu Aug 16 2012 - 10:26:08 EST


Quoting Borislav Petkov <bp@xxxxxxxxx>:

On Wed, Aug 15, 2012 at 08:34:25PM +0300, Jussi Kivilinna wrote:
About ~5% slower, probably because I was tuning for sandy-bridge and
introduced more FPU<=>CPU register moves.

Here's new version of patch, with FPU<=>CPU moves from original
implementation.

(Note: also changes encryption function to inline all code in to main
function, decryption still places common code to separate function to
reduce object size. This is to measure the difference.)

Yep, looks better than the previous run and also a bit better or on par
with the initial run I did.

Thanks again. Speed gained with patch is ~8%, and is able of getting twofish-avx pass twofish-3way.


The thing is, I'm not sure whether optimizing the thing for each uarch
is a workable solution software-wise or maybe having a single version
which performs sufficiently ok on all uarches is easier/better to
maintain without causing code bloat. Hmmm...

Agreed, testing on multiple CPUs to get single well working version is what I have done in the past. But purchasing all the latest CPUs on the market isn't option for me, and for testing AVX I'm stuck with sandy-bridge :)

-Jussi

4th:
====
ran like 1st.

[ 1014.074150]
[ 1014.074150] testing speed of async ecb(twofish) encryption
[ 1014.083829] test 0 (128 bit key, 16 byte blocks): 4870055 operations in 1 seconds (77920880 bytes)
[ 1015.092757] test 1 (128 bit key, 64 byte blocks): 2043828 operations in 1 seconds (130804992 bytes)
[ 1016.099441] test 2 (128 bit key, 256 byte blocks): 606400 operations in 1 seconds (155238400 bytes)
[ 1017.105939] test 3 (128 bit key, 1024 byte blocks): 168939 operations in 1 seconds (172993536 bytes)
[ 1018.112517] test 4 (128 bit key, 8192 byte blocks): 21777 operations in 1 seconds (178397184 bytes)
[ 1019.119035] test 5 (192 bit key, 16 byte blocks): 4882254 operations in 1 seconds (78116064 bytes)
[ 1020.125716] test 6 (192 bit key, 64 byte blocks): 2043230 operations in 1 seconds (130766720 bytes)
[ 1021.132391] test 7 (192 bit key, 256 byte blocks): 607477 operations in 1 seconds (155514112 bytes)
[ 1022.138889] test 8 (192 bit key, 1024 byte blocks): 168743 operations in 1 seconds (172792832 bytes)
[ 1023.145476] test 9 (192 bit key, 8192 byte blocks): 21442 operations in 1 seconds (175652864 bytes)
[ 1024.152012] test 10 (256 bit key, 16 byte blocks): 4891863 operations in 1 seconds (78269808 bytes)
[ 1025.158684] test 11 (256 bit key, 64 byte blocks): 2049390 operations in 1 seconds (131160960 bytes)
[ 1026.165366] test 12 (256 bit key, 256 byte blocks): 606847 operations in 1 seconds (155352832 bytes)
[ 1027.171841] test 13 (256 bit key, 1024 byte blocks): 169228 operations in 1 seconds (173289472 bytes)
[ 1028.178436] test 14 (256 bit key, 8192 byte blocks): 21773 operations in 1 seconds (178364416 bytes)
[ 1029.184981]
[ 1029.184981] testing speed of async ecb(twofish) decryption
[ 1029.194508] test 0 (128 bit key, 16 byte blocks): 4931065 operations in 1 seconds (78897040 bytes)
[ 1030.199640] test 1 (128 bit key, 64 byte blocks): 2056931 operations in 1 seconds (131643584 bytes)
[ 1031.206303] test 2 (128 bit key, 256 byte blocks): 589409 operations in 1 seconds (150888704 bytes)
[ 1032.212832] test 3 (128 bit key, 1024 byte blocks): 163681 operations in 1 seconds (167609344 bytes)
[ 1033.219443] test 4 (128 bit key, 8192 byte blocks): 21062 operations in 1 seconds (172539904 bytes)
[ 1034.225979] test 5 (192 bit key, 16 byte blocks): 4931537 operations in 1 seconds (78904592 bytes)
[ 1035.232608] test 6 (192 bit key, 64 byte blocks): 2053989 operations in 1 seconds (131455296 bytes)
[ 1036.239289] test 7 (192 bit key, 256 byte blocks): 589591 operations in 1 seconds (150935296 bytes)
[ 1037.241784] test 8 (192 bit key, 1024 byte blocks): 163565 operations in 1 seconds (167490560 bytes)
[ 1038.244387] test 9 (192 bit key, 8192 byte blocks): 20899 operations in 1 seconds (171204608 bytes)
[ 1039.250923] test 10 (256 bit key, 16 byte blocks): 4937343 operations in 1 seconds (78997488 bytes)
[ 1040.257589] test 11 (256 bit key, 64 byte blocks): 2050678 operations in 1 seconds (131243392 bytes)
[ 1041.264262] test 12 (256 bit key, 256 byte blocks): 586869 operations in 1 seconds (150238464 bytes)
[ 1042.270753] test 13 (256 bit key, 1024 byte blocks): 163548 operations in 1 seconds (167473152 bytes)
[ 1043.277365] test 14 (256 bit key, 8192 byte blocks): 21053 operations in 1 seconds (172466176 bytes)
[ 1044.283892]
[ 1044.283892] testing speed of async cbc(twofish) encryption
[ 1044.293349] test 0 (128 bit key, 16 byte blocks): 5186240 operations in 1 seconds (82979840 bytes)
[ 1045.298534] test 1 (128 bit key, 64 byte blocks): 1921034 operations in 1 seconds (122946176 bytes)
[ 1046.305207] test 2 (128 bit key, 256 byte blocks): 542787 operations in 1 seconds (138953472 bytes)
[ 1047.311699] test 3 (128 bit key, 1024 byte blocks): 141399 operations in 1 seconds (144792576 bytes)
[ 1048.318312] test 4 (128 bit key, 8192 byte blocks): 17755 operations in 1 seconds (145448960 bytes)
[ 1049.324829] test 5 (192 bit key, 16 byte blocks): 5196441 operations in 1 seconds (83143056 bytes)
[ 1050.331485] test 6 (192 bit key, 64 byte blocks): 1921456 operations in 1 seconds (122973184 bytes)
[ 1051.338157] test 7 (192 bit key, 256 byte blocks): 543581 operations in 1 seconds (139156736 bytes)
[ 1052.344658] test 8 (192 bit key, 1024 byte blocks): 141473 operations in 1 seconds (144868352 bytes)
[ 1053.351270] test 9 (192 bit key, 8192 byte blocks): 17601 operations in 1 seconds (144187392 bytes)
[ 1054.357823] test 10 (256 bit key, 16 byte blocks): 5190283 operations in 1 seconds (83044528 bytes)
[ 1055.364462] test 11 (256 bit key, 64 byte blocks): 1912796 operations in 1 seconds (122418944 bytes)
[ 1056.371134] test 12 (256 bit key, 256 byte blocks): 542719 operations in 1 seconds (138936064 bytes)
[ 1057.377643] test 13 (256 bit key, 1024 byte blocks): 141377 operations in 1 seconds (144770048 bytes)
[ 1058.384229] test 14 (256 bit key, 8192 byte blocks): 17752 operations in 1 seconds (145424384 bytes)
[ 1059.390799]
[ 1059.390799] testing speed of async cbc(twofish) decryption
[ 1059.400187] test 0 (128 bit key, 16 byte blocks): 4889197 operations in 1 seconds (78227152 bytes)
[ 1060.405460] test 1 (128 bit key, 64 byte blocks): 1980831 operations in 1 seconds (126773184 bytes)
[ 1061.408145] test 2 (128 bit key, 256 byte blocks): 568695 operations in 1 seconds (145585920 bytes)
[ 1062.410647] test 3 (128 bit key, 1024 byte blocks): 158294 operations in 1 seconds (162093056 bytes)
[ 1063.417258] test 4 (128 bit key, 8192 byte blocks): 20312 operations in 1 seconds (166395904 bytes)
[ 1064.423758] test 5 (192 bit key, 16 byte blocks): 4904906 operations in 1 seconds (78478496 bytes)
[ 1065.430440] test 6 (192 bit key, 64 byte blocks): 1983636 operations in 1 seconds (126952704 bytes)
[ 1066.437104] test 7 (192 bit key, 256 byte blocks): 564340 operations in 1 seconds (144471040 bytes)
[ 1067.443613] test 8 (192 bit key, 1024 byte blocks): 157404 operations in 1 seconds (161181696 bytes)
[ 1068.450216] test 9 (192 bit key, 8192 byte blocks): 20055 operations in 1 seconds (164290560 bytes)
[ 1069.456753] test 10 (256 bit key, 16 byte blocks): 4901215 operations in 1 seconds (78419440 bytes)
[ 1070.463417] test 11 (256 bit key, 64 byte blocks): 1978968 operations in 1 seconds (126653952 bytes)
[ 1071.470073] test 12 (256 bit key, 256 byte blocks): 568440 operations in 1 seconds (145520640 bytes)
[ 1072.476580] test 13 (256 bit key, 1024 byte blocks): 158329 operations in 1 seconds (162128896 bytes)
[ 1073.483177] test 14 (256 bit key, 8192 byte blocks): 20311 operations in 1 seconds (166387712 bytes)
[ 1074.489739]
[ 1074.489739] testing speed of async ctr(twofish) encryption
[ 1074.499266] test 0 (128 bit key, 16 byte blocks): 4565109 operations in 1 seconds (73041744 bytes)
[ 1075.504391] test 1 (128 bit key, 64 byte blocks): 1955085 operations in 1 seconds (125125440 bytes)
[ 1076.511055] test 2 (128 bit key, 256 byte blocks): 573971 operations in 1 seconds (146936576 bytes)
[ 1077.517563] test 3 (128 bit key, 1024 byte blocks): 158489 operations in 1 seconds (162292736 bytes)
[ 1078.524175] test 4 (128 bit key, 8192 byte blocks): 20330 operations in 1 seconds (166543360 bytes)
[ 1079.530702] test 5 (192 bit key, 16 byte blocks): 4550468 operations in 1 seconds (72807488 bytes)
[ 1080.537358] test 6 (192 bit key, 64 byte blocks): 1943897 operations in 1 seconds (124409408 bytes)
[ 1081.544030] test 7 (192 bit key, 256 byte blocks): 564033 operations in 1 seconds (144392448 bytes)
[ 1082.550531] test 8 (192 bit key, 1024 byte blocks): 157126 operations in 1 seconds (160897024 bytes)
[ 1083.557170] test 9 (192 bit key, 8192 byte blocks): 20121 operations in 1 seconds (164831232 bytes)
[ 1084.563713] test 10 (256 bit key, 16 byte blocks): 4403637 operations in 1 seconds (70458192 bytes)
[ 1085.570360] test 11 (256 bit key, 64 byte blocks): 1961264 operations in 1 seconds (125520896 bytes)
[ 1086.577008] test 12 (256 bit key, 256 byte blocks): 571514 operations in 1 seconds (146307584 bytes)
[ 1087.583517] test 13 (256 bit key, 1024 byte blocks): 158342 operations in 1 seconds (162142208 bytes)
[ 1088.590121] test 14 (256 bit key, 8192 byte blocks): 20392 operations in 1 seconds (167051264 bytes)
[ 1089.596648]
[ 1089.596648] testing speed of async ctr(twofish) decryption
[ 1089.606061] test 0 (128 bit key, 16 byte blocks): 4517104 operations in 1 seconds (72273664 bytes)
[ 1090.611326] test 1 (128 bit key, 64 byte blocks): 1953102 operations in 1 seconds (124998528 bytes)
[ 1091.617989] test 2 (128 bit key, 256 byte blocks): 574354 operations in 1 seconds (147034624 bytes)
[ 1092.624497] test 3 (128 bit key, 1024 byte blocks): 158402 operations in 1 seconds (162203648 bytes)
[ 1093.631110] test 4 (128 bit key, 8192 byte blocks): 20369 operations in 1 seconds (166862848 bytes)
[ 1094.637618] test 5 (192 bit key, 16 byte blocks): 4524710 operations in 1 seconds (72395360 bytes)
[ 1095.644293] test 6 (192 bit key, 64 byte blocks): 1940148 operations in 1 seconds (124169472 bytes)
[ 1096.650957] test 7 (192 bit key, 256 byte blocks): 567684 operations in 1 seconds (145327104 bytes)
[ 1097.657466] test 8 (192 bit key, 1024 byte blocks): 158922 operations in 1 seconds (162736128 bytes)
[ 1098.664088] test 9 (192 bit key, 8192 byte blocks): 20087 operations in 1 seconds (164552704 bytes)
[ 1099.670596] test 10 (256 bit key, 16 byte blocks): 4397085 operations in 1 seconds (70353360 bytes)
[ 1100.677278] test 11 (256 bit key, 64 byte blocks): 1961007 operations in 1 seconds (125504448 bytes)
[ 1101.683933] test 12 (256 bit key, 256 byte blocks): 577961 operations in 1 seconds (147958016 bytes)
[ 1102.690452] test 13 (256 bit key, 1024 byte blocks): 158836 operations in 1 seconds (162648064 bytes)
[ 1103.697038] test 14 (256 bit key, 8192 byte blocks): 20427 operations in 1 seconds (167337984 bytes)
[ 1104.703575]
[ 1104.703575] testing speed of async lrw(twofish) encryption
[ 1104.713108] test 0 (256 bit key, 16 byte blocks): 3555452 operations in 1 seconds (56887232 bytes)
[ 1105.718261] test 1 (256 bit key, 64 byte blocks): 1617632 operations in 1 seconds (103528448 bytes)
[ 1106.724924] test 2 (256 bit key, 256 byte blocks): 495199 operations in 1 seconds (126770944 bytes)
[ 1107.731442] test 3 (256 bit key, 1024 byte blocks): 137358 operations in 1 seconds (140654592 bytes)
[ 1108.738065] test 4 (256 bit key, 8192 byte blocks): 17637 operations in 1 seconds (144482304 bytes)
[ 1109.740593] test 5 (320 bit key, 16 byte blocks): 3478175 operations in 1 seconds (55650800 bytes)
[ 1110.743248] test 6 (320 bit key, 64 byte blocks): 1591957 operations in 1 seconds (101885248 bytes)
[ 1111.749911] test 7 (320 bit key, 256 byte blocks): 493803 operations in 1 seconds (126413568 bytes)
[ 1112.756430] test 8 (320 bit key, 1024 byte blocks): 137066 operations in 1 seconds (140355584 bytes)
[ 1113.763034] test 9 (320 bit key, 8192 byte blocks): 17288 operations in 1 seconds (141623296 bytes)
[ 1114.769587] test 10 (384 bit key, 16 byte blocks): 3576437 operations in 1 seconds (57222992 bytes)
[ 1115.776232] test 11 (384 bit key, 64 byte blocks): 1587771 operations in 1 seconds (101617344 bytes)
[ 1116.782890] test 12 (384 bit key, 256 byte blocks): 493841 operations in 1 seconds (126423296 bytes)
[ 1117.789396] test 13 (384 bit key, 1024 byte blocks): 137324 operations in 1 seconds (140619776 bytes)
[ 1118.795993] test 14 (384 bit key, 8192 byte blocks): 17625 operations in 1 seconds (144384000 bytes)
[ 1119.802548]
[ 1119.802548] testing speed of async lrw(twofish) decryption
[ 1119.811940] test 0 (256 bit key, 16 byte blocks): 3590161 operations in 1 seconds (57442576 bytes)
[ 1120.817198] test 1 (256 bit key, 64 byte blocks): 1623745 operations in 1 seconds (103919680 bytes)
[ 1121.823872] test 2 (256 bit key, 256 byte blocks): 482001 operations in 1 seconds (123392256 bytes)
[ 1122.830398] test 3 (256 bit key, 1024 byte blocks): 133842 operations in 1 seconds (137054208 bytes)
[ 1123.836992] test 4 (256 bit key, 8192 byte blocks): 17195 operations in 1 seconds (140861440 bytes)
[ 1124.843536] test 5 (320 bit key, 16 byte blocks): 3536998 operations in 1 seconds (56591968 bytes)
[ 1125.850156] test 6 (320 bit key, 64 byte blocks): 1625698 operations in 1 seconds (104044672 bytes)
[ 1126.856830] test 7 (320 bit key, 256 byte blocks): 482518 operations in 1 seconds (123524608 bytes)
[ 1127.863348] test 8 (320 bit key, 1024 byte blocks): 133672 operations in 1 seconds (136880128 bytes)
[ 1128.869959] test 9 (320 bit key, 8192 byte blocks): 16860 operations in 1 seconds (138117120 bytes)
[ 1129.876469] test 10 (384 bit key, 16 byte blocks): 3637750 operations in 1 seconds (58204000 bytes)
[ 1130.883151] test 11 (384 bit key, 64 byte blocks): 1626131 operations in 1 seconds (104072384 bytes)
[ 1131.889814] test 12 (384 bit key, 256 byte blocks): 483999 operations in 1 seconds (123903744 bytes)
[ 1132.896324] test 13 (384 bit key, 1024 byte blocks): 133598 operations in 1 seconds (136804352 bytes)
[ 1133.902920] test 14 (384 bit key, 8192 byte blocks): 17206 operations in 1 seconds (140951552 bytes)
[ 1134.905485]
[ 1134.905485] testing speed of async xts(twofish) encryption
[ 1134.905501] test 0 (256 bit key, 16 byte blocks): 2908165 operations in 1 seconds (46530640 bytes)
[ 1135.908137] test 1 (256 bit key, 64 byte blocks): 1462715 operations in 1 seconds (93613760 bytes)
[ 1136.914715] test 2 (256 bit key, 256 byte blocks): 506478 operations in 1 seconds (129658368 bytes)
[ 1137.921320] test 3 (256 bit key, 1024 byte blocks): 148018 operations in 1 seconds (151570432 bytes)
[ 1138.927924] test 4 (256 bit key, 8192 byte blocks): 19435 operations in 1 seconds (159211520 bytes)
[ 1139.934451] test 5 (384 bit key, 16 byte blocks): 2905195 operations in 1 seconds (46483120 bytes)
[ 1140.941116] test 6 (384 bit key, 64 byte blocks): 1454656 operations in 1 seconds (93097984 bytes)
[ 1141.947683] test 7 (384 bit key, 256 byte blocks): 504479 operations in 1 seconds (129146624 bytes)
[ 1142.954280] test 8 (384 bit key, 1024 byte blocks): 148172 operations in 1 seconds (151728128 bytes)
[ 1143.960892] test 9 (384 bit key, 8192 byte blocks): 19433 operations in 1 seconds (159195136 bytes)
[ 1144.967410] test 10 (512 bit key, 16 byte blocks): 2904583 operations in 1 seconds (46473328 bytes)
[ 1145.974091] test 11 (512 bit key, 64 byte blocks): 1501387 operations in 1 seconds (96088768 bytes)
[ 1146.980652] test 12 (512 bit key, 256 byte blocks): 504501 operations in 1 seconds (129152256 bytes)
[ 1147.987254] test 13 (512 bit key, 1024 byte blocks): 148180 operations in 1 seconds (151736320 bytes)
[ 1148.993842] test 14 (512 bit key, 8192 byte blocks): 19439 operations in 1 seconds (159244288 bytes)
[ 1150.000380]
[ 1150.000380] testing speed of async xts(twofish) decryption
[ 1150.009770] test 0 (256 bit key, 16 byte blocks): 3007004 operations in 1 seconds (48112064 bytes)
[ 1151.015056] test 1 (256 bit key, 64 byte blocks): 1534733 operations in 1 seconds (98222912 bytes)
[ 1152.021642] test 2 (256 bit key, 256 byte blocks): 508129 operations in 1 seconds (130081024 bytes)
[ 1153.028246] test 3 (256 bit key, 1024 byte blocks): 144920 operations in 1 seconds (148398080 bytes)
[ 1154.034859] test 4 (256 bit key, 8192 byte blocks): 18870 operations in 1 seconds (154583040 bytes)
[ 1155.041367] test 5 (384 bit key, 16 byte blocks): 3009083 operations in 1 seconds (48145328 bytes)
[ 1156.048040] test 6 (384 bit key, 64 byte blocks): 1535084 operations in 1 seconds (98245376 bytes)
[ 1157.054609] test 7 (384 bit key, 256 byte blocks): 508112 operations in 1 seconds (130076672 bytes)
[ 1158.061215] test 8 (384 bit key, 1024 byte blocks): 145035 operations in 1 seconds (148515840 bytes)
[ 1159.067830] test 9 (384 bit key, 8192 byte blocks): 18890 operations in 1 seconds (154746880 bytes)
[ 1160.070368] test 10 (512 bit key, 16 byte blocks): 3076988 operations in 1 seconds (49231808 bytes)
[ 1161.073040] test 11 (512 bit key, 64 byte blocks): 1540659 operations in 1 seconds (98602176 bytes)
[ 1162.079610] test 12 (512 bit key, 256 byte blocks): 508316 operations in 1 seconds (130128896 bytes)
[ 1163.086195] test 13 (512 bit key, 1024 byte blocks): 144951 operations in 1 seconds (148429824 bytes)
[ 1164.092792] test 14 (512 bit key, 8192 byte blocks): 18865 operations in 1 seconds (154542080 bytes)

--
Regards/Gruss,
Boris.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/