Re: [uml-devel] Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19

From: Blaisorblade
Date: Thu Nov 03 2005 - 22:21:30 EST


On Thursday 03 November 2005 06:41, Rob Landley wrote:
> On Wednesday 02 November 2005 23:26, Jeff Dike wrote:
> > On Wed, Nov 02, 2005 at 05:28:35PM -0600, Rob Landley wrote:
> > > With fragmentation reduction and prezeroing, UML suddenly gains the
> > > option of calling madvise(DONT_NEED) on sufficiently large blocks as A)
> > > a fast way of prezeroing, B) a way of giving memory back to the host OS
> > > when it's not in use.

> > DONT_NEED is insufficient. It doesn't discard the data in dirty
> > file-backed pages.

> I thought DONT_NEED would discard the page cache, and punch was only needed
> to free up the disk space.
This is correct, but...

> I was hoping that since the file was deleted from disk and is already
> getting _some_ special treatment (since it's a longstanding "poor man's
> shared memory" hack), that madvise wouldn't flush the data to disk, but
> would just zero it out. A bit optimistic on my part, I know. :)

I read at some time that this optimization existed but was deemed obsolete and
removed.

Why obsolete? Because... we have tmpfs! And that's the point. With DONTNEED,
we detach references from page tables, but the content is still pinned: it
_is_ the "disk"! (And you have TMPDIR on tmpfs, right?)

> > Badari Pulavarty has a test patch (google for madvise(MADV_REMOVE))
> > which does do the trick, and I have a UML patch which adds memory
> > hotplug. This combination does free memory back to the host.

> I saw it wander by, and am all for it. If it goes in, it's obviously the
> right thing to use.
Btw, on this side of the picture, I think fragmentation avoidance is not
needed for that.

I guess you refer to using frag. avoidance on the guest (if it matters for the
host, let me know). When it will be present using it will be nice, but
currently we'd do madvise() on a page-per-page basis, and we'd do it on
non-consecutive pages (basically, free pages we either find or free or
purpose).

> You may remember I asked about this two years ago:
> http://seclists.org/lists/linux-kernel/2003/Dec/0919.html

> And a reply indicated that SVr4 had it, but we don't. I assume the "naming
> discussion" mentioned in the recent thread already scrubbed through this
> old thread to determine that the SVr4 API was icky.
> http://seclists.org/lists/linux-kernel/2003/Dec/0955.html

I assume not everybody did (even if somebody pointed out the existance of the
SVr4 API), but there was the need, in at least one usage, for a virtual
address-based API rather than a file offset based one, like the SVr4 one -
that user would need implementing backward mapping in userspace only for this
purpose, while we already have it in the kernel.

Anyway, the sys_punch() API will follow later - customers need mainly
madvise() for now.
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade





___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/