Re: why does mlockall appear to make memcpy slower ?

From: Larry Woodman (woodman@missioncriticallinux.com)
Date: Mon Apr 03 2000 - 13:44:24 EST


Paul Barton-Davis wrote:

> The following program prints:
>
> Average msecs per MB 4.914158; Average copy rate: 0.000005 msecs/byte
>
> if run without root permission (i.e. mlockall() fails), and
>
> Average msecs per MB 7.417227; Average copy rate: 0.000007 msecs/byte
>
> if run with root permission.
>
> Is there a simple explanation ?
>
> --p
>
> --------------------------------------------------------------------
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <asm/msr.h>
>
> main ()
>
> {
> char buf[1048576];
> char obuf[1048576];
> int i;
> float total;
> unsigned long now, then;
>
> #define N 1000
> #define CYCLES_PER_MSEC 450000.0f
>
> total = 0;
>
> mlockall (MCL_CURRENT);
>
> for (i = 0; i < N; i++) {
> rdtscl (then);
> memcpy (buf, obuf, sizeof (obuf));
> rdtscl (now);
> total += now - then;
> }
>
> printf ("Average msecs per MB %.6f; "
> "Average copy rate: %.6f msecs/byte\n",
> total / (N * CYCLES_PER_MSEC),
> total / (N * 1048576.0 * CYCLES_PER_MSEC));
> }
> --------------------------------------------------------------------------
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

The reason for this running slower when mlockall() succeeds is due to what is
known as the ZERO_PAGE.
At boot time the kernel allocates and zeros out the ZERO_PAGE. If the first
access to an anonymous page
is a read, the ZERO_PAGE is mapped into that anonymous virtual page. In
your case when mlockall() fails
the entire 256 pages of the obuf array ends up mapping the ZERO_PAGE. Since
obuf is never modified all
reads come from the ZERO_PAGE and that is in the cache. When
mlockall() succeeds, real memory is
mapped and locked into both the buf and obuf arrays before the memcpy() loop
starts. In this case there
are 256 different pages mapped into the obuf array and they cant all fit in
the cache, therefore the program
runs slower.

If you add a "memcpy(obuf, buf, sizeof (obuf));" after the existing
memcpy() line you will modify both
arrays and the time will be about the same.

Larry Woodman
http://www.missioncriticallinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:10 EST