Re: [PATCH] Fix OOPS in mmap_region() when merging adjacent VM_LOCKEDfile segments

From: Linus Torvalds
Date: Tue Feb 03 2009 - 11:42:45 EST




On Tue, 3 Feb 2009, Lee Schermerhorn wrote:
>
> This reminded me of something I'd seen recently looking
> at /proc/<pid>/[numa]_maps for <a large commercial database> on
> Linux/x86_64:
>
> 2adadf2b9000-2adadf2c0000 rwxp 00000000 00:0e 4072 /dev/zero
>
> For portability between Linux and various Unix-like systems that don't
> support MAP_ANON*, perhaps?

Odd.

At first I thought that it is just that Linux will turn a MAP_SHARED |
MAP_ANON into that /dev/zero thing, so you won't be able to tell by lookup
at /proc/maps. So it would be very possible that the application did not
actually open /dev/zero at all, and used MAP_ANON instead (see the whole
shmem_zero_setup() and shmem_file_setup() thing).

But those mappings have that 'p' for private there, so it's not
MAP_SHARED. And yes, that means that your large commercial database really
did open /dev/zero and mapped it privately. They must be living in the
past.

> Anyway, from the addresses and permissions, these all look potentially
> mergeable. The offset is preventing merging, right? I guess that's one
> of the downsides of mapping /dev/zero rather than using MAP_ANONYMOUS?

Yeah. The MAP_ANON code has a total hack:

case MAP_PRIVATE:
/*
* Set pgoff according to addr for anon_vma.
*/
pgoff = addr >> PAGE_SHIFT;
break;

where the whole point is to allow sharing: since pgoff doesn't matter, we
can make it be something that will merge _if_ you don't play games (of
course, if you then start usign mremap to move things around, that all
breaks, and you lose the merging ;)

That said, if it's just a hundred segments, nobody really cares. It's
going to make vma lookup fractionally slower, but not so anybody would
likely ever notice even in benchmarks. And if it's just this one db, it's
certainly not going to use any noticeable amount of memory either.

Merging is important, but it's important to avoid the _really_ common
cases, and to make /proc/maps more readable etc. It's not like it matters
for the occasional crazy setup.

But you could still try to teach the DB people to use MAP_ANON.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/