Re: Memory testing program

Peter T. Breuer (ptb@dit.upm.es)
Sat, 21 Sep 1996 21:39:42 +0200 (MET DST)


> > Sorry to prolong the thread. But memory errors ought to be eliminated
> > from the discussion of bugs, and advertising a memory tester as simple
> > as this is one way of doing it.
> >
> > Peter T. Breuer
>
> Peter,
>
> After your message I retried this program and compared the results
> against QAPlus and AMIDiag. The memtest86 programs reported numerous
> errors on my Linux-2.0.19/20, AMD 486DX/2-80 machine. I am using a

In that case there are memory errors*. (It is my experience that about
20% of new machines have memory errors that do not show up on DOS, or
on booting a kernel, or on normal running). You can see from the C code
that all it does is write and then read. If your machine does not pass
that test then fin - it has memory errors by definition.

*Caveat below!

> "Green PC" motherboard with a late-1996 Award BIOS and 48-MB
> memory. The memory simms are pretty good ones; i.e., 60-ns, etc. There
> are two 72-pin, 16-MB simms and four 30-pin, 4-MB simms. The 4s are
> NEC simms and the 16s are Hitachi.
>
> These chips check out clean on the other testers, and they do not
> cause signal 11 errors, even when compiling Linux and Postgres95.

>
> Maybe I do not understand how to use the LInux memtest86 program, but
> I simply cannot believe it.

I am afraid that you should believe the evidence that appears here. You
have run a program that does nothing other than read and write from the
same locations repeatedly and it has reported errors.

I would suggest you next modify the test code in order to investigate in
detail the areas of memory that are reported bad. Check that the errors
don't move, or if they do, whether there is a general pattern (stuck
address line, ...)

Note that the memtest utility has some compilation/configuration
options. These allow for testing at slow AND fast refresh rates,
using different patterns and so on. If you feel that they are not
appropriate tests to run for your machine comment out the relevant bit.
Or try and find which configuration your machine has the errors in.

OK - now for the caveat. If you have an AMD then it may be that you are
running with the different refresh rate changes enabled in the test and
the AMD has a bad reaction to the code that does it. I know nothing
about AMDs so this is a shot in the dark. There is some assembler that
i can't read in the memtest utility that is supposed to set up a
register for this purpose (of course *I* comment out all the code
options like that that i don't understand in order to get something
that I do :).

But I have tested tens and tens of machines with this utility and it
has distinguished exactly those machines with known errors or suspicious
symptoms from those with no errors known and no record of problems.

I recommend that you look at the source code of memtest.c and change it
to do exactly what you consider to be a valid test. Why would you trust
a DOS utility with unknown code against something which does something
clearly comprehensible .. It's simple enough to change to

for (i=0; i++ < 64 * 1024 * 1024) {write(i,a);b=read(i),report(a^b);}

or something like that if you aren't satisfied with the way it does it.

Maybe you will find an error in the AMD that they will pay you not to
disclose!

> --
> Regards,
> Paul Matthews
> email: paul@matthews.com
>

Peter T. Breuer
,---------------------------------------------------------------------------
|Departamento de Ingenieria de Sistemas Telematicos, Universidad Politecnica
|de Madrid, Escuela Tecnica Superior de Ingenieros de Telecomunicacion,
|Ciudad Universitaria, E--28040 Madrid, SPAIN.
|Tel. Office : +34 (1)336 6831
| Fax : +34 (1)543 2077 or 336 7333
|Internet : <ptb@eng.cam.ac.uk, ptb@comlab.ox.ac.uk, ptb@dit.upm.es>
| URL : http://www.dit.upm.es:80/~ptb/
`---------------------------------------------------------------------------