Re: AIX disclaim() or Tru64 madvise (MADV_DONTNEED) needed

Dr. Michael Weller (eowmob@exp-math.uni-essen.de)
Fri, 20 Aug 1999 19:15:23 +0200 (MESZ)


On 20 Aug 1999, Christoph Rohland wrote:

> Tethys <tethys@it.newsint.co.uk> writes:
>
> > Christoph Rohland <hans-christoph.rohland@sap-ag.de> writes:
[...]
> > Why is this a problem? Whether the pages ares freed immediately
> > or merely marked to be freed at a later time should be of no
> > relevance to the application, only to the OS. Or are you relying
> > on the zero-fill-on-demand behaviour? What do you do on other
> > OSes that don't support this (e.g., Solaris)?
>
> Because the other OS's do not free the pages but it only gives a hint
> to the memory subsystem that these pages are not needed in the near
> future so it can be paged out.

As a try of clarification: I don't think he relies semantically on the
zero-fill-on-demand behaviour. But he wants to tell the system that he
is going to COMPLETELY replace the pages with other data.

The point IS NOT to tell the system: You might want to page these pages
out, I'm not going to need them soon. The general idea is that linux is
efficient enough to know better if and when it will page this out or not
(Now, knowing there is no intelligence in any computer program at all,
I don't know if I should believe the authors of the mm system on that)

The point is that it makes no sense to swap out these pages at all: It
will use disk space and I/O time to page them back in. And all that only
to completely overwrite them.

Usually, the way to go would be to just mmap /dev/zero over them. But the
problem is that this is shared memory and the 'shared' aspect of the pages
must be kept although their current content can be lost until they are
modified next time.

On the other hand, if there is plenty memory at hand, it makes no sense to
replace them with 0000 either. That would only impose CPU overhead when
the pages are copied from the zero-page on write.

>From this point of efficiency, thinking about it, maybe something like
madvise(MADV_DONTCARE) or even madvise(MADV_DONTNEED) (But an application
might rely on other OS's not replacing the contents on MADV_DONTNEED;

As an application coder, I might try to just use madvise() or use a
placebo #define madvise(a,b,c) if the libc doesn't have one. If I were
you, I'd definitely try that. The more I think about this, I think the
existence of these calls implemented on the other systems falsely makes
you think there is any real point in calling them.

It is yet to be shown if the occasional (unneeded) copy-on-write from
zero-page or the occasional (unneeded) page out+in using some very good
and smart I/O system with raid (like you will have on any machine sensible
for SAP) is more overhead.

Anyway, due to the 4GB limit, you'll have to reuse the
madvise(MADV_DONTNEED) interprocess communication buffers over time.

Either you use them so rarely that the page out+in or any multiproc
mmap/munmap overhead does not matter or you use them so often, that any
pageout of OTHER pages to make room for the new pages of a reused
madvise(MADV_DONTNEED) (which make no sense to be just forgotten because
they are then immediately to be replaced by zeropages) does matter soo
much that you'll put in enough memory anywway s.t. the interprocess
communication buffers are not paged out anyway (and then you also benefit
from the now unrequired copy-on-writes from zero-page).

Thinking about that, do you really think just #define madvise(a,b,c) has
any measurable, negative effect on performance in your case? Maybe it is
even faster then replacing the memory with zero pages would be?

(the latter copy overhead could only be avoided if the system would allow
to allocate some pages with garbage entries, which it doesn't, and
actually can't do for non-root processes in the general case due to
security issues (like you get a page with confidential contents of root or
anther user proc) )

Just having to add some (actually few) MB of pageing space should not be a
problem.

Michael.

--

Michael Weller: eowmob@exp-math.uni-essen.de, eowmob@ms.exp-math.uni-essen.de, or even mat42b@spi.power.uni-essen.de. If you encounter an eowmob account on any machine in the net, it's very likely it's me.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/