Re: Ingo's PIII FXSTOR patch [2 Mar] causes problems

Kurt Garloff (K.Garloff@ping.de)
Mon, 19 Apr 1999 01:01:13 +0200


--ADZbWkCsHQ7r3kzd
Content-Type: multipart/mixed; boundary=Kj7319i9nmIyA2yE

--Kj7319i9nmIyA2yE
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

On Fri, Apr 16, 1999 at 02:38:08PM +0200, Ingo Molnar wrote:
>=20
> i think i've found the bug. (the bug was that __put_user is self-detecting
> operand sizes, and the twd in the 'hard' structure is a char, while in the
> user-structure it's a long, so we silently lost significant bits.) I also
> fixed two other bugs and improved performance of the conversion routines.
> Does the attached patch (against 2.2.5) run your numeric application fine
> now? [There is also a new #define in process.c that switches on the
> 'hardware-based' conversion variants, just in case there are still
> problems.]

Ingo,

thanks for sending the patch. I tried the one you sent to me with seperate
e-mail: fx-2.2.5-A4. I applied it to 2.2.5-ac3+devfs+HZ=3D400 and it did ap=
ply
cleanly.

Here are the results:
* My benchmark now produces correct results. No strange NaN or other errors
could be seen, any longer
* Logging out from KDE, no problem occurs
* At least two modules compiled for patched 2.2.5-ac3-fx had problem with
plain 2.2.5 kernel: serial and floppy. Is FP being used inside these? Or
just a matter of too small kernel structs ... ?

=3D> I think the problems with your PIII patch are cured!

* The benchmark results are hard to interpret. I compared plain 2.2.5-ac3
results against the ones with your patch. The 2.2.5-ac3 kernel was running
for a couple of days before, whilst the patched one was freshly booted.
Strange enough, the results seemed to be discouraging: they were slightly
worse. After some time (compiling some KDE binaries in between), I reran
the benchmark, and this time, it was slightly better than the old results.
I can not conclude anything from it, but that there are some cache
aliasing effects(or whatever) which do influence the benchmark more than
the fxsave/fxrestore optimization.
If you think it should be measurable, I can do boot into Single user mode
and do a large number of tests to get a reasonable statistics.

Appended is a sdiff (96 columns). left side: Plain kernel. right side:
Patched kernel.=20

Thanks for looking into this problem and solving it.
--=20
Dipl.Phys. Kurt Garloff <kurt@garloff.de> [Wuppertal, FRG]
Plasma physics, high perf. computing [Linux-ix86,-axp, DUX]
PGP key: see mailheader / key servers [Linux SCSI driver: DC390]

--Kj7319i9nmIyA2yE
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="bench.diff"
Content-Transfer-Encoding: quoted-printable

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Numeric lib (1.9.5) benchmark 1.30 (Apr 18 19 Numeric lib (1.9.5) benchmark=
1.30 (Apr 18 19
FPU CW: 0x127f (double precision), 2 threads FPU CW: 0x127f (double preci=
sion), 2 threads
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Vector test with builtin cplx double Vector ( Vector test with builtin cplx=
double Vector (
--------------------------------------------- -----------------------------=
----------------
Vector constructor (4) : 0.001 s | Vector constructor (4) : 0.0=
00 s
Setting up Vectors (3) : 0.659 s | Setting up Vectors (3) : 0.6=
58 s
d=3Demul(a,b); d+=3Dc ( 4x): 0.355 s | d=3Demul(a,b); d+=3Dc ( 4x=
): 0.342 s
d =3D emul(a,b) + c ( 4x): 0.359 s | d =3D emul(a,b) + c ( 4x): =
0.348 s
d=3Da; d+=3Db; d+=3Dc ( 8x): 0.769 s | d=3Da; d+=3Db; d+=3Dc =
( 8x): 0.778 s
d =3D a + b + c ( 8x): 0.656 s | d =3D a + b + c ( 8x): =
0.632 s
d =3D " man. opt. ( 8x): 0.379 s | d =3D " man. opt. ( 8x): =
0.369 s
Search f. a value ( 4x): 0.052 s Search f. a value ( 4x): 0.052 s
d =3D const*a; d+=3Db (16x): 1.115 s | d =3D const*a; d+=3Db (16x=
): 1.146 s
d =3D const*a + b (16x): 0.789 s | d =3D const*a + b (16x): =
0.774 s
c =3D " man. opt. (16x): 0.679 s | c =3D " man. opt. (16x): =
0.645 s
fabs(c-d); (c!=3Dd) ( 4x): 0.589 s equal | fabs(c-d); (c!=3Dd) ( 4x=
): 0.588 s equal
Freeing memory : 0.003 s Freeing memory : 0.003 s
--------------------------------------------- -----------------------------=
----------------
Total for Vector : 6.406 s | Total for Vector : 6.3=
35 s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Matrix test with builtin cplx double Matrix ( Matrix test with builtin cplx=
double Matrix (
--------------------------------------------- -----------------------------=
----------------
Matrix constructor (4) : 0.001 s | Matrix constructor (4) : 0.0=
00 s
Setting up matrices(3) : 0.138 s Setting up matrices(3) : 0.138 s
D =3D A * B; D +=3D C ( 1x): 0.807 s | D =3D A * B; D +=3D C ( 1x=
): 0.731 s
D =3D A * B + C ( 1x): 0.808 s | D =3D A * B + C ( 1x): =
0.730 s
D=3DA; D+=3DB; D+=3DC (10x): 0.163 s | D=3DA; D+=3DB; D+=3DC =
(10x): 0.159 s
D =3D A + B + C (10x): 0.135 s | D =3D A + B + C (10x): =
0.134 s
D =3D " man. opt. (10x): 0.099 s | D =3D " man. opt. (10x): =
0.101 s
D =3D const*A; D+=3DB (20x): 0.249 s | D =3D const*A; D+=3DB (20x=
): 0.237 s
D =3D const*A + B (20x): 0.194 s | D =3D const*A + B (20x): =
0.195 s
C =3D " man. opt. (20x): 0.171 s | C =3D " man. opt. (20x): =
0.175 s
fabs(C-D); (C!=3DD) ( 5x): 0.108 s equal | fabs(C-D); (C!=3DD) ( 5x=
): 0.107 s equal
Freeing memory : 0.001 s Freeing memory : 0.001 s
--------------------------------------------- -----------------------------=
----------------
Total for Matrix : 2.873 s | Total for Matrix : 2.7=
08 s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Matrix-Vector operations builtin cplx double Matrix-Vector operations buil=
tin cplx double=20
Tol: 1e-10, Diag domin: 82, Rand seed: 200, Tol: 1e-10, Diag domin: 82, =
Rand seed: 200,=20
--------------------------------------------- -----------------------------=
----------------
Matrix(2),Vector(6) con: 0.000 s Matrix(2),Vector(6) con: 0.000 s
Filling Mat(1),Vec(3) : 0.215 s | Filling Mat(1),Vec(3) : 0.2=
14 s
v2 =3D M1 * v1 (50x): 0.924 s | v2 =3D M1 * v1 (50x): =
0.934 s
CGS solver : 7.248 s iter:208, | CGS solver : =
6.952 s iter:208,
BiCGstab solver : 7.176 s iter:220, | BiCGstab solver : =
7.403 s iter:220,
CGS with diag precond : 0.503 s iter: 15, CGS with diag precond : 0=
.503 s iter: 15,
BiCGstab w diag precond: 0.455 s iter: 14, | BiCGstab w diag precond: =
0.521 s iter: 14,
LU decomposition of M2 : 2.803 s | LU decomposition of M2 : 2.8=
06 s
v3 =3D LU_solve, compare : 0.019 s differ: 2 v3 =3D LU_solve, compare :=
0.019 s differ: 2
Freeing memory : 0.001 s Freeing memory : 0.001 s
--------------------------------------------- -----------------------------=
----------------
Total for Matrix-Vector : 19.344 s | Total for Matrix-Vector : 19.3=
55 s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Total for ALL : 28.623 s | Total for ALL : 28.3=
98 s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Compiler: Reading specs from /usr/local/lib/g Compiler: Reading specs from =
/usr/local/lib/g
Bench flags: -O3 -ffast-math -felide-construc Bench flags: -O3 -ffast-math =
-felide-construc
Machine info: kg1 CPU: GenuineIntel GenuineIn Machine info: kg1 CPU: Genuin=
eIntel GenuineIn
Son Apr 18 21:45:24 CEST 1999 up 4 days, 7:5 | Mon Apr 19 00:57:39 CEST 1=
999 up 10 min, 2 u
=0C=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =0C=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Numeric lib (1.9.5) benchmark 1.30 (Apr 18 19 Numeric lib (1.9.5) benchmark=
1.30 (Apr 18 19
FPU CW: 0x127f (double precision), 2 threads FPU CW: 0x127f (double preci=
sion), 2 threads
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Vector test with double Vector (450000): Vector test with double Vector (45=
0000):
--------------------------------------------- -----------------------------=
----------------
Vector constructor (4) : 0.000 s Vector constructor (4) : 0.000 s
Setting up Vectors (3) : 0.743 s | Setting up Vectors (3) : 0.7=
42 s
d=3Demul(a,b); d+=3Dc ( 4x): 0.372 s d=3Demul(a,b); d+=3Dc ( 4x): 0.=
372 s
d =3D emul(a,b) + c ( 4x): 0.374 s d =3D emul(a,b) + c ( 4x): 0.374 s
d=3Da; d+=3Db; d+=3Dc ( 8x): 0.858 s | d=3Da; d+=3Db; d+=3Dc =
( 8x): 0.856 s
d =3D a + b + c ( 8x): 0.738 s d =3D a + b + c ( 8x): 0.738 s
d =3D " man. opt. ( 8x): 0.429 s | d =3D " man. opt. ( 8x): =
0.428 s
Search f. a value ( 4x): 0.132 s | Search f. a value ( 4x): 0.1=
31 s
d =3D const*a; d+=3Db (16x): 1.248 s | d =3D const*a; d+=3Db (16x=
): 1.244 s
d =3D const*a + b (16x): 0.878 s | d =3D const*a + b (16x): =
0.876 s
c =3D " man. opt. (16x): 0.768 s | c =3D " man. opt. (16x): =
0.762 s
fabs(c-d); (c!=3Dd) ( 4x): 0.675 s equal fabs(c-d); (c!=3Dd) ( 4x): 0=
.675 s equal
Freeing memory : 0.003 s Freeing memory : 0.003 s
--------------------------------------------- -----------------------------=
----------------
Total for Vector : 7.218 s | Total for Vector : 7.2=
02 s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Matrix test with double Matrix (300x300): Matrix test with double Matrix (3=
00x300):
--------------------------------------------- -----------------------------=
----------------
Matrix constructor (4) : 0.001 s | Matrix constructor (4) : 0.0=
00 s
Setting up matrices(3) : 0.158 s | Setting up matrices(3) : 0.1=
59 s
D =3D A * B; D +=3D C ( 1x): 0.979 s | D =3D A * B; D +=3D C ( 1x=
): 0.974 s
D =3D A * B + C ( 1x): 0.974 s | D =3D A * B + C ( 1x): =
0.968 s
D=3DA; D+=3DB; D+=3DC (10x): 0.216 s | D=3DA; D+=3DB; D+=3DC =
(10x): 0.212 s
D =3D A + B + C (10x): 0.180 s | D =3D A + B + C (10x): =
0.177 s
D =3D " man. opt. (10x): 0.113 s | D =3D " man. opt. (10x): =
0.110 s
D =3D const*A; D+=3DB (20x): 0.315 s | D =3D const*A; D+=3DB (20x=
): 0.301 s
D =3D const*A + B (20x): 0.209 s | D =3D const*A + B (20x): =
0.202 s
C =3D " man. opt. (20x): 0.197 s | C =3D " man. opt. (20x): =
0.182 s
fabs(C-D); (C!=3DD) ( 5x): 0.129 s equal | fabs(C-D); (C!=3DD) ( 5x=
): 0.130 s equal
Freeing memory : 0.001 s Freeing memory : 0.001 s
--------------------------------------------- -----------------------------=
----------------
Total for Matrix : 3.472 s | Total for Matrix : 3.4=
16 s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Matrix-Vector operations double (600) : Matrix-Vector operations double (6=
00) :
Tol: 1e-10, Diag domin: 122, Rand seed: 200, Tol: 1e-10, Diag domin: 122,=
Rand seed: 200,
--------------------------------------------- -----------------------------=
----------------
Matrix(2),Vector(6) con: 0.000 s Matrix(2),Vector(6) con: 0.000 s
Filling Mat(1),Vec(3) : 0.245 s Filling Mat(1),Vec(3) : 0.245 s
v2 =3D M1 * v1 (50x): 0.403 s | v2 =3D M1 * v1 (50x): =
0.404 s
CGS solver : 3.933 s iter:233, | CGS solver : =
3.897 s iter:233,
BiCGstab solver : 6.441 s iter:377, | BiCGstab solver : =
6.407 s iter:377,
CGS with diag precond : 1.561 s iter: 91, | CGS with diag precond : =
1.549 s iter: 91,
BiCGstab w diag precond: 1.770 s iter:104, | BiCGstab w diag precond: =
1.769 s iter:104,
LU decomposition of M2 : 4.398 s | LU decomposition of M2 : 4.3=
92 s
v3 =3D LU_solve, compare : 0.011 s differ: 5 v3 =3D LU_solve, compare :=
0.011 s differ: 5
Freeing memory : 0.001 s Freeing memory : 0.001 s
--------------------------------------------- -----------------------------=
----------------
Total for Matrix-Vector : 18.764 s | Total for Matrix-Vector : 18.6=
74 s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Total for ALL : 29.454 s | Total for ALL : 29.2=
93 s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Compiler: Reading specs from /usr/local/lib/g Compiler: Reading specs from =
/usr/local/lib/g
Bench flags: -O3 -ffast-math -felide-construc Bench flags: -O3 -ffast-math =
-felide-construc
Machine info: kg1 CPU: GenuineIntel GenuineIn Machine info: kg1 CPU: Genuin=
eIntel GenuineIn
Son Apr 18 21:45:53 CEST 1999 up 4 days, 7:5 | Mon Apr 19 00:58:08 CEST 1=
999 up 10 min, 2 u

--Kj7319i9nmIyA2yE--

--ADZbWkCsHQ7r3kzd
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3in

iQCVAwUBNxpkORaQN/7O/JIVAQFgMAP/ZKjHYlR0G6BpNmjAst11f1u6BdO2PT/d
ZC/Oz8PV9P4wiWWGvJyuGjdjnTgGpziPu3hAKVaNNxbvbUHb9U58WAzDSYpb+4Sf
g3dUZcHgMVCiboyFovpwqIYD/tAtvavcUlKP8bv0QQWwhifVWmOGV9OeOk+fTga9
dwUEvGwktos=
=PGbu
-----END PGP SIGNATURE-----

--ADZbWkCsHQ7r3kzd--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/